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DETERMINING THE FUNCTIONS AND INTERACTIONS OF 
PROTEINS BY COMPARATIVE ANALYSIS 

Related Applications 

The present t^plication is a continuation-in-part application ("CIP") of Patent 
5 Convention Treaty (PCT) International Application Serial No: PCTAJSOO/02246, filed in the 
U.S. receiving office on January 28, 2000, and this application claims the benefit of priority 
under 35 U.S.C. § 1 19(e) of U.S; Provisional Application Nos. 60/165,124, and 60/165,086, 
both filed November 12, 1999, and U.S. Provisional Application No. 60/179,531, filed February 
1 , 2000. International Application Serial No: PCT/USOO/02246 claims the benefit of priority 
10 under 35 U.S.C. § 1 19(e) of U.S. Provisional AppUcation Serial No. 60/1 17,844, filed January 
29, 1999, U.S. Provisional Application Serial No. 60/118,206, filed February 1, 1999, U.S. 
Provisional Application Serial No. 60/126,593, filed March 26, 1999, U.S. Provisional 
Applications Serial No. 60/134,093, filed May 14, 1999, and U.S. Provisional Application 
Serial No. 60/134,092, filed May 14, 1999. Each of the aforementioned applications is 
1 5 explicitly incorporated herein by reference in their entirety and for all purposes. 

TECHNICAL FIELD 
This invention generally relates to genetics^d microbiology. The invention 
provides novel methods to identify the fimction of and relationships between nucleic acid and 
protein sequences. The method is particularly useful for finding the identifying genes and 
20 polypeptides having potential therapeutic relevance in organisms, e.g., microorganisms, such 
as Mycobacterium tuberculosis. The invention also provides Mycobacterium tuberculosis 
genes and polypeptides found by these methods. These genes and polypeptides are useful as 
potential drug targets. 

BACKGROUND 

25 The determination of the functions of and relationships between nucleic acid 

and protein sequences has traditionally relied on either the study of homology and sequence 
identity with genes and proteins of known function or, in the absence of informative 
homology, laborious experimental work. The availability of many complete genome 
sequences has made it possible to develop new strategies for computational determination of 

30 protein functions. Several methods have been developed which can predict the general 
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function of proteins by analyzing their functional relationships rather than sequence 
similarity. Cienerally. two proteins can be considered functionally related when they form 
part of the same biochemical pathway or biological process. For example, although malate 
dehydrogenase is not homologous to pyruvate carboxylase, and the two enzymes do not 
catalyze the same reaction, they are functionally related because they both catalyze steps of a 
common biochemical pathway, namely the tiicarboxyUc acid cycle. 

New methods that can estabUsh such functional relationships could provide 
valuable information on the functions of uncharacterized nucleic acid and protein sequences. 

The disease tuberculosis, caused Afycobacterium tuberculosis (MTB) is one 
of the world's leading killers. The World Health Organization estimates that 30 milUon deaths 
from puhnonary tuberculosis wUl occur during this decade. Alarming reports on the 
emergence of drug-resistant strains of this bacterium underscore the importance of the search 
for new therapeutic agents. Identifying the fimction of every protein produced by MTB will 
provide researchers with promising new targets for anti-tuberculosis drug design. 

SUMMARY 

The invention provides novel methods for characterizing the function of 
nucleic acids and polypeptides. The invention provides a novel method for identifying a 
nucleic acid or a polypeptide sequence that may be a target for a drug. Hie invention provides 
a novel method for identifying a nucleic acid or a polypeptide sequence that may be essential 
for the growth or viability of an organism. The characterization is based on use of methods of 
the invention comprising algorithms that can identify functional relationships between diverse 
sets of non-homologous nucleic acid and polypeptide sequences. Characterization of nucleic 
acid and protein sequences can be the basis for the development of compositions that can 
interact with those nucleic acids and polypeptides. For example, such characterization can 
provide a basis for screening methods. Such characterization may aUow use of these 
sequences as targets for drug discovery. Discovery of such compositions can provide the 
basis for the design of novel drugs, particularly if the characterized sequences ai* derived 

from a pathogen. 

The invention provides a method for identifying a nucleic acid or a 
polypeptide sequence that may be a target for a drug comprising the following steps: (a) 



8NSDOCID: <WO 



0135317A1 I > 



wo 01/35317 



PCTAJSOO/31152 



providing a first nucleic acid or a polypeptide sequence that is known to be a drug target; (b) 
providing at least one algorithm selected fi-om the group consisting of a "domain fusion" 
method, a "phylogenetic profile" method and a "physiologic linkage" method, wherein the 
algorithm is capable analyzing a functional relationship between nucleic acid or polypeptide 

5 sequences; and, (c) comparing the first nucleic acid or the polypeptide drug target sequence 
to a plurality of sequences using at least one of the algorithms as set forth in step (b) to 
identify a second sequence that has a functional relationship to the first sequence, thereby 
identifying a nucleic acid or a polypeptide sequence that may be a target for a drug. 

The invention provides a method for identifying a nucleic acid or a 

10 polyi)eptide sequence that may be essential for the growth or viability of an organism 

comprising the following steps: (a) providing a first nucleic acid or a polypeptide sequence 
diat is known to be essential for the growth or viability of an organism; (b) providing at least 
one algorithm capable analyzing a functional relationship between nucleic acid or 
polypeptide sequences selected fix)m the group consisting of a "domain fusion" method, a 

15 "phylogenetic profile" method and a *'physiologic linkage" method; and, (c) comparing the 
first nucleic acid or the polypeptide sequence to a plurality of sequences iising at least one of 
the algorithms as set forth in step (b) to identify a second sequence that has a functional 
relationship to the first sequence, thereby identifying a nucleic acid or a polypeptide 
sequence that may be essential for the growth or viability of an organism. 

20 In one aspect of the methods of the invention, the drug is an anti-microbial 

dmg. In another aspect, the first nucleic acid or a polypeptide sequence is derived from a 
pathogen. The pathogen can be a microorganism, such as Mycobacterium tuberculosis 
(MTB). 

The plurality of sequences used to identify a second sequence can comprise a 
25 database of the gene sequences of an entire genome of an organism. The plurality of 

sequences used to identify a second sequence can comprise a database of the gene sequences 
derived fix>m a pathogen. 

In one aspect of the methods of the invention, the "phylogenetic profile" 
method algorithm comprises (a) obtaining data, comprising a list of proteins fi-om at least two 
30 genomes; (b) comparing the list of proteins to fomi a protein phylogenetic profile for each 
protein, wherein the protein phylogenetic profile indicates the presence or absence of a 

3 
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protein belonging to a particular protein family in each of the at least two genomes based on 
homology of the proteins; and (c) grouping the list of proteins based on similar profiles, 
wherein proteins with similar profiles are indicated to have a fimctional relationship. THe 
phylogenetic profile can be in the form of a vector, matrix or phylogenetic tree. The 
♦-phylogenetic profile" method can fiirther comprise determining the significance of 
homology between the proteins by computing a probabUity (p) value threshold. The 
probabUity can be set v^th respect to the value 1/NM. based on the total number of sequence 
comparisons that are to be performed, wherein 7^ is the number of proteins in the first 
organism's genome and A/in all other genomes. The presence or absence of a protein 
belonging to a particular protein family in each of the at least two genomes can be 
determined by calculating an evolutionary distance. The evolutionary distance can be 
calculated by: (a) aligning two sequences from the list of proteins; (b) determining an 
evolution probability process by constructing a conditional probabiUty matrix: p(aa-^aa'), 
where aa and aa' are any amino acids, said conditional probability matrix being constructed 
by converting an amino acid substitution matrix from a log odds matrix to said conditional 
probabUity matrix; (c) accounting for an observed aUgmnent of the constructed conditional 
probability matrix by taking the product of the conditional probabilities for each aligned pair 
during the aUgmnent of the two sequences, represented by P(p)=UP^oa. - aa'-) ; and, (d) 

determining an evolutionary distance a from powers equation p'=p-(aa--aa-). maximizing 
for P. The conditional probability matrix can be defined by a Markov process with 
substitution rates, over a fixed time interval. The conversion from an amino acid substitution 
matrix to a conditional probability matrix can be represented by: 

BLOSUM62ij 

where BLOSIJM62 is an amino acid substitution matrix, and P(i->j) is the 
probability that amino acid / is replaced by amino acidj through point mutations according to 
BLOSUM62 scores. In one aspect, the Pfs are the abundances of amino acid J and are 
computed by solving a plurality of linear equations given by the normalization condition that: 
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In alternative aspects of the methods of the invention, the ''physiologic 
linkage" method algorithm identifies proteins and nucleic acids that participate in a common 
functional pathway; identifies proteins and nucleic acids that participate in the synthesis of a 

5 common structural complex; and, identifies proteins and nucleic acids that participate in a 
coDimon metabolic pathway. 

In one aspect of the invention, the "domain fusion" method algorithm 
comprises (a) aligning a first primary amino acid sequence of multiple distinct non- 
homologous polypeptides to second primary amino acid sequence of a plurality of proteins; 

10 and, (b) for any alignment found between the first primary amino acid sequences of all of 
such multiple distinct non-homologous polypeptides and at least one protein of the second 
primary amino acid sequences, outputting an indication identifying the aligned second 
primary amino acid sequence as an indication of a functional link between the aligned first 
and second polypeptide sequences. The aligning can be performed by an algorithm selected 

15 fi-om the group consisting of a Smith- Waterman algorithm, Needleman-Wimsch algorithm, a 
BLAST algorithm, a FASTA algorithm, and a PSI-BLAST algorithm. The multiple distinct 
non-homologo\is polypeptides can be obtained by translating a nucleic acid sequence fi-om a 
genome database. The plurality of proteins can have a known function. At least one of the 
multiple distinct non-homologous polypeptides can have a known function. At least one of 

20 the multiple distinct non-homologoxis polypeptides can have an unknown function. The 

alignment can be based on the degree of homology of the multiple distinct non-homologous 
polypeptides to the plurality of proteins. The "domain fusion" method can comprise 
determining the significance of the aligned and identified second primary amino acid 
sequence by computing a probability (p) value threshold. The probability threshold can be 

25 set with respect to the value 1/NM, based on the total number of sequence comparisons that 
are to be performed, wherein N is the nimiber of proteins in a first organism's genome and M 
in all other genomes. The "domain fusion" method can further comprising filtering excessive 
functional links between one first primary amino acid sequence of multiple distinct non- 
homologous polypeptides and an excessive number of other distinct non-homologous 

30 polypeptides for any alignment foxmd between the first primary amino acid sequences of the 

5 
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distinct non-homologous polypeptides and at least one of the second primary amino acid 

sequences of the plurality of proteins. 

The invention provides a computer program product, stored on a computer- 
readable medium, for identifying a nucleic acid or a polypeptide sequence that may be a 
target for a drug, the computer program product comprising instructions for causmg a 
computer system to be capable of: (a) inputting a first nucleic acid or a polypeptide sequence 
that is known to be a drug target; (b) accessing at least one algorithm c^ble analyzmg a 
functional relationship between nucleic acid or polypeptide sequences selected from the 
group consisting of a«domain fiision" method, a •'phylogenetic profile" method and a 
•-physiologic linkage" method; and (c) comparing the first nucleic acid or the polypeptide 
drug target sequence to a pluraUty of sequences using at least one of the algorithms set forth 
in step (b) to identify a second sequence that has a fiinctional relationship to the first 
sequence and generating an output identifying a nucleic acid or a polypeptide sequence that 

may be a target for a drug . 

The invention provides a computer program product, stored on a computer- 
readable medium, for identifying a nucleic acid or a polypeptide sequence that may be 
essential for the growth or viability of an organism, the computer program product 
comprisinginstructionsforcausingacomputersystemtobecapableof: (a) providing a first 

nucleic acid or a polypeptide sequence that is known to be essential for the growth or 
viability of an organism; (b) accessing at least one algorithm capable analyzing a fimctional 
relationship between nucleic acid or polypeptide sequences selected from the group 
consisting of a "domain fiision" method, a "phylogenetic profile" method and a "physiologic 
linkage" method; and, (c) comparing the first nucleic acid or the polypeptide sequence to a 
plurality of sequences using at least one of the algorithms set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence and generatmg an 
output identifying a nucleic acid or a polypeptide sequence that may be essential for the 

growth or viability of an organism- 

The invention provides a computer system, comprising: (a) a processor; and. 

a computer program product of the invention. 



30 
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All publications, patents, patent applications, GenBank sequences and ATCC 
deposits, cited herein are hereby expressly incorporated by reference for all purposes. 

The details of one or more embodiments of the invention are set forth in the 
accompanying drawings and the description below. Other features, objects, and advantages 
of the invention wiU be apparent from the description and drawings, and from the claims. 



DESCRIPTION OF DRAWINGS 

Figure 1 is an example of functional linkages predicted between InhA (Rv 

1484) and other TB genes. 

Figure 2 is an example of predicted functional linkages between embB (Rv 
1 0 3795), which is a target of the drug ethambutol, and other TB genes using the phylogenetic 
profile method. 

Figure 3 is an example of predicted functional linkages between five TB genes 
having homology to penicillin binding proteins and other TB genes. 

Figure shows that gcpE (Rv 2868C) is predicted to be functional linked to cell 

15 wall metabolism. 

Figure 5 shows predicted functional linkages of htrA (Rv 1223C) with other 

TB genes. 

Like reference symbols in the various drawings indicate like elements. 



DETAILED DESCRIPTION 

The present invention provides novel methods for identifying the relationships 
between and the function of nucleic acid and polypeptide sequences. The methods of the 
invention identify novel genes and polypeptides on the basis of their fimctional linkage to 
other proteins whose biological function or processes is known or inferred by homology. 

The genes and polypeptides identified by the methods of the invention can be 
used in screening methods for the identification of compositions which, by binding or 
otherwise interacting with the gene or polypeptide, are capable of modifying the physiology 
and growth of an organism. The compositions identified by these screening methods are 
useful as drugs and pharmaceuticals. Thus, genes and polypeptides identified by the methods 
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of the invention, including the genes and polypeptides identified herein, can be used as 

potential drug targets. 

One aspect of the invention provides methods for identifying the function of 
genes and polypeptides fi-om fvfycobacterium tuberculosis (MTB or TB). Based on this new 
5 fimctional determination, tiiese genes and polypeptides can be used to screen for 
compositions citable of modifying the physiology and growth of Mycobacterium 
tuberculosis (TB). Thus, genes and polypeptides identified by tiie methods of the invention, 
including the genes and polypeptides identified herein, can be used as targets in screening 
protocols and can be usefijl as potential drug targets. 
, 0 The fimction of the TB genes and polypeptides of the present invention were 

identified using the metiiods of the invention; i.e., they were identified on the basis of their 
fimctional linkage to other proteins whose biological fimction or processes were known by 
experiment or inferred by homology. TB genes and polypeptides that are fimctionaUy linked 
to genes known to be involved in pathogenesis or organisms survival are potential drug 
1 5 targets. Genes or polypeptides associated with TB pathogenesis, survival or that are 

important or unique to TB biochemical patiiways are potential drug targets. TB genes and 
polypeptides that have no homologues identified in humans are potential drug targets. The 
function of many of tiie TB genes and polypeptides identified is based on the genes or 
polypeptides with which they are functionally linked. 
2Q TB genes whose function was identified using the methods of the invention 

are effectively targeted by a drug (i.e., they can act as bona fide drug targets) provides proof 
of principle that tiie invention's metiiods for identifying functionally linked genes can 
identify TB genes and polypeptides that are drug targets. Further confirmation that the genes 
identified by tiie metiiods of tiie invention include bona fide drug targets can be supported by 
25 tiie feet tiiat genes abeady known to be targets for drugs have been independentiy identified, 
or "re-discovered," by the invention's methods. 

The novel TB genes described herein are identified as being functionally 
related or linked to otiier genes, including otiier TB genes, such as a known TB drug target 
InhA polypeptide, which is a target of isoniazid). These fimctional linkages are 
30 established using matiiematical algoritiims. The assignment or inference of a function to TB 
genes and polypeptides based on tiieir linkage or relatedness to otiier genes and polypeptides 
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is described in U.S. provisional application serial no. 60/165,086. Potential TB drug targets 
are identified by several methods discussed herein and in further detail in U.S. provisional 
appUcation serial no. 60/134,092. Through the use of these methods, TB genes and 
polypeptides have been identified as potential drug targets and are illustrated on Tables 1 and 
2, and Figures 1 to 5. The nucleotide and amino acid sequences of these potential drug 
targets are illustrated on Tables 3 and 4, respectively (see below). 

The phrase "functional link," "fiinctionally related" and grammatical 
variations thereof, when used in reference to genes or polypeptides, means that the genes or 
polypeptides are predicted to be linked or related. A particular example of functionally 
related or linked proteins is where two proteins participate in a biochemical or metaboUc 
pathway (e.g., malate dehydrogenase and fumarase, which are both present in the TCA 
cycle). Thus, although functionally linked or related proteins may not have sequence 
homology to each other, they are linked by virtue of their participation in the same 
biochemical pathway. Other examples of linked or related polypeptides are where two 
polypeptides are part of a protein complex, physically interact, or act upon each another. 

The "domain fusion" or "Rosetta Stone" method searches protein sequences 
across all known genomes and identifies proteins that are separate in one organism but joined 
as intramolecular domains into one larger protein in another organism. Such proteins that are 
separate in some organisms but joined in others often cany out related or sequential fimctions 
and are therefore functionally linked. 

The phylogenetic profile method compares protein sequences across all 
known genomes and analyzes the pattern of inheritance of each protein across the different 
organisms. Proteins that have similar patterns of inheritance, either acquired or lost as a part 
of a group of proteins through evolution, are fiinctionally linked. The gene proximity method 
identifies genes that remain physically close or "clustered" throughout evolution and are 
therefore functionally linked. 

A particular example of the identification of a potential TB drug target would 
be to identify a TB gene or polypeptide fimctionally linked to a known drug target. Anti-TB 
drugs include isoniazid, rifampicin, ethambutol, streptomycin, pyrazxinamide, and 
thiacetazone. For isoniazid, this drug is beUeved to act through enoyl-acyl reductase InhA, 
resulting in mycolic acid biosynthesis inhibition. Thus, TB genes or polypeptides 
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functionally linked to enoyl-acyl reductase InhA are potential drug targets; see Figure 1 , 
which shows an analysis of InhA, the target for isoniazid, the most widely used anti- 
tuberculosis drug, and functional linkages to a set of genes mostly known or hypothesized to 
be involved in cell wall-related processes and lipid and polyketide metabolism. Particular 
5 examples of the identification of several TB genes and polypeptides that are functionally 
related to the target of these anti-TB drugs is shown in Figures 1 to 5. 

^Domain Fusion'' or **Rosetta Stone" Method 

The "domain fusion" or "Rosetta Stone" method compares protein sequences 
across known nucleic acid databases (e.g., known genomes) to identify genes and proteins 

10 that are separate entities in one organism but are joined into one larger multidomain protein 
in another organism. In such cases, the two separate proteins often carry out related or 
sequential functions or form part of a larger protein complex. Therefore, the general function 
of one component (e.g., one or more of the unknown proteins) can be inferred from the 
known function of the other component. In addition, merely identifying links between 

15 proteins using the method described herein provides valuable information (e.g., usefulness as 
a target for an antibacterial drug), regardless of whether the function of one or more of the 
proteins used to form the link(s) is known. Because the two components do not have similar 
amino acid sequence the function of one could not be inferred from the other on the basis of 
sequence similarity alone. 

20 The methods for identifying drug targets (e.g., TB drug targets) described 

herein (e.g., the "Rosetta Stone Method") are based on the idea that proteins that participate 
in a conmion stractural complex, metabolic pathway, biological process or with closely 
related physiological functions, are functionally linked. In addition, these methods also are 
capable of identifying proteins that interact physically with one another. Functionally linked 

25 proteins in one organism can often be found fused into a single polypeptide chain in a 

different organism. Similarly, fused proteins in one organism can be found as individual 
proteins in other organisms. For example, in a first organism one might identify two vm- 
linked proteins "A" and "B" with unknown function. In another organism, one may find a 
single protein "AB" with a part that resembles "A" and a part that resembles "B". Protein 

30 AB allows one to predict that "A" and "B" are functionally related. 
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The functional activity of each distinct protein in the "Rosetta Stone" method 
need not be known prior to performing the method (i.c., the function of A, B, or AB need not 
be known). Using the "Rosetta Stone" method to compare and analyze several unknown 
protein sequences can provide information regarding relationships of each protein absent 
knowledge about the fiinctional activity of the initially analyzed proteins themselves. For 
example, the information (/.c, the links) can provide information that the proteins are part of 
a common pathway, function in a related process or physically interact Such information 
need not be based on the biological function of the individual proteins. 

These methods can provide information regarding links between previously 
un-linked proteins that function, for example, in a concerted process. A marker, for example, 
for a particular disease state is identified by the presence or absence of a protein (e.^., 
Her2/neu in breast cancer detection). Links (i.e., information) identified by the method, 
which link proteins "B" and "C" to such a marker suggest that proteins "B" and "C" are 
related by function, physical interaction or part of a common biological pathway with the 
marker. Such information is useful in designing screening methods and identifying drug 
targets (e.g., TB drug targets), making diagnostics, and designing therapeutics. 

In one approach, the "Rosetta Stone" method is performed by sequence 
comparison that searches for incomplete "triangle relationships" between, for example, three 
proteins, i.e., for two proteins A' and B' that are different from one another but similar in 
sequence to anotiier protein AB. Completing tiie triangle relationship provides useful 
information regarding the proteins' biological function(s), functional interaction, pathway 
relationships or physical relationships with other proteins in the "triangle." 

Either nucleotide sequences or amino acid sequences can be used in the 
methods for identifying functionally related or linked genes or polypeptides. Where a 
nucleic sequence is to be used it can be first translated from a nucleic acid sequence to amino 
acid sequence. Such translation may be performed in all flames if the coding sequence* is not 
known. Programs that can translate a nucleic acid sequence are known in the art. In 
addition, for simplicity, the description of this method discusses the use of a "pair** of 
proteins in the determination of a "Rosetta Stone" protein, more than 2 may be used (e.g., 3, 
4, 5, 10, 100 or more proteins). Accordingly, one can analyze chains of linked proteins, such 
as "A" linked by a Rosetta Stone protein to "B" linked by a Rosetta Stone protein to "C", etc. 

II 
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By this method, groups of functionally related proteins can be found and their function 
identified. 

A method can start with identifying the primary amino acid sequence for a 
plurality of proteins >vhose functional relationship is to be determined {e.g., protein A* and 
protein B'). A number of source databases are available, as described above, that contain 
either a nucleic acid sequence and/or a deduced amino acid sequence for use with the first 
step. The plurality of sequences (the "probe sequences") are then used to search a sequence 
database, e.g., CienBank (NCBI. NLM. NIH), PFAM (a large collection of multiple sequence 
alignments and hidden Markov models covering many common protein domains; 
Washington University, St Louis MO) or ProDom (a database based on recursive PSI- 
BLAST searches and designed as a tool to help analyze domain arrangements of proteins and 
protein famiUes. see. e.g.. Corpet (1999) Nucleic Acids Res. 27:263-267). either 
simultaneously or individuaUy. Eveiy protein in the sequence database is examined for its 
ability to act as a '-Rosetta Stone" protein {te., a single protein containing polypeptide 
sequences or domains from both protein A' and protein B')- A number of different methods 
of performing such sequence searches are known in the art. Such sequence aligmnent 
methods include, for example. BLAST (see. e.g.. Altschul (1990) J. Mol. Biol. 215: 403- 
410) BLnZ(MPsrch) (see. e.g..Bremxer (1995) Trends Genet 11:330-331; and infra), and 
FASTA (see, e.g.. Pearson (1988) Proc. Natl. Acad. Sci. USA 85(8):2444-2448; and infia). 
me probe sequence can be any length (.e.g., about 50 amino acid residues to about 1000 

amino acid residues). 

Probe sequences (e.g.. polypeptide sequences or domains) found in a single 
protein (e.g., an "AB" multidomain protein) are defined as being "linked" by that protein. 
Where the probe sequences are used individually to search the sequence database, one can 
mask those segments having homology to the first probe sequence found in the proteins of 
the sequence database prior to searching with the subsequent probe sequence. In this way. 
one eliminates any potential overiapping sequences between the two or more probe 
seqxiences. 

The linked proteins can then be ftjrther compared for similarity with one 
another by amino acid sequence comparison. Where the sequences are identical or have high 
homology, such a finding can be indicative of the formation of homo-dimers. -trimers. etc. 

12 
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Typically, "Rosetta Stone"-linked proteins are only kept when the linked proteins show no 
homology to one another {e.g., hetero-dimers, trimers, etc.). 

In another method for identifying functional linkages, a potential fusion 
protein lacking any functional information that is sxispected of having two or more domains, 

5 (e.g., a potential "Rosetta Stone" protein) may be used to search for related proteins. In this 
method, the primary amino acid of the fusion protein is determined and used as a probe 
sequence. This probe sequence is used to search a sequence database (e.g., GenBank, PFAM 
or ProDom). Every protein in the sequence database is examined for homology to the 
potential fusion protein (i.e., multiple proteins containing polypeptide sequences or domains 

10 from the potential fusion protein). A number of different methods of performing such 

sequence searches are known in the art, e.g., BLAST, BLITZ (Biocomputing Research Unit, 
University of Edinburgh, Scotland, the "MPsrch program" performs comparisons of protein 
sequences against the Swiss-Prot protein sequence database using the Smith and Waterman 
best local similarity algorithm), and FASTA. 

15 Probe sequences found in more than one protein (e.g.. A* and B' proteins) are 

defined as being "linked" so long as at least one protein per domain containing that domain 
but not the other is also identified. In other words, at least one protein or domain of the 
plurality of proteins must also be found alone in the sequence database. This verifies that the 
protein or domain is not an integral part of a first protein but rather a second independent 

20 protein having its own functional characteristics. 

Statistical methods can be used to judge the significance of possible matches. 
The statistical significance of an alignment score is described by the probability, P, of 
obtaining a higher score when the sequences are shuffled. One way to compute a P value 
threshold is to first consider the total nxmiber of sequence comparisons that are to be 

25 performed. For example, if there are proteins in E. coli and Af in all other genomes this 
number is NxM. If a comparison of this number of random sequence would result in one 
pair to yield a P value of 1/NMhy chance this then is set as the threshold. 

This method provides information regarding which proteins are functionally 
related (e.g., related biological functions common structural complexes, metabolic pathways 

30 or biological process) a subset of which physically interact in an organism. 
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Alignment Algorithms 

To align sequences, a number of different procedures can be used that produce 
a good match between the corresponding residues in the sequences. Typically, the Smith- 
Waterman (Smith (1981) Adv. Appl. Math. 2:482) or Needleman-Wunsch algorithm 
(Needleman (1970) J. Mol. Biol. 48:443) algorithm, are used, however, other, fester 
procedures such as BLAST. FASTA. PSI-BLAST (a version of Blast for finding protein 
famiUes). or others known in the art (see infra discussion), can be used. 

Filtering Methods 

The Rosetta Stone Method provides at least two pieces of information. First 
the method provides information regarding which proteins are functionally related. Second 
the method provides information regarding which proteins are physicaUy related. Each of 
these two pieces of information has different sources of error and prediction. The first type 
of error is introduced by protein sequences that occur in many different proteins and paired 
with many other protein sequences. The second type of error is introduced due to there often 
being multiple copies of similar proteins, called paralogs. in a single organism. In general, 
the "Rosetta Stone" method predicts fimctionally related proteins weU. with no filtering of 
results required. However, it is possible to filter the error associated with either the first or 

second type of information. 

The invention recognizes that a few domains are linked to an excessive 
number of other domains by a '•Rosetta Stone" protein. For example, 95% of the domains 
are linked to fewer than 25 other domains. However, some domains. e.g., the Src Homology 
3 (SH3) domain or ATP-binding cassette (ABC domains), link to more than a hundred other 
domains. These links were filtered by removing all Imks generated involving these 5% of 
domains (i.e.. the domains linked to more than 25 other domains). For example, in E. coli. 
without filtering, 3531 links were identified using the domain-based analysis, but after 
filtering only 749 links were identified. This method improved prediction of fimctionally 
related proteins by 28% and physically related proteins by 47%. Accordingly, there are a 
number of ways to filter the results to improve the significance of the fimctional links. As 
described above, as the number of fimctional links increases there is an increased higher 
chance of fmding a "Rosetta Stone" protein. By reducing the excessively linked proteins one 
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reduces the chance number of "Rosetta Stone" protems thereby increasing the significance of 
a functional link. 

Error introduced by multiple paralogs of linked proteins should have little 
eflfect on functional prediction, as paralogs usually have very similar function, but will affect 
6 the reliability of prediction of protein-protein interactions. This estimate is calculated for 
each linked protein pair, and can be estimated roughly as: 

Fractional Error =1 — — , 

N 

vsrhere N is the number of paralogous protein pairs, (e.^. , A linked to B, A' linked to 
10 B', A linked to B', and A' linked to B, in the case that A and A' are paralogs, as are B 

and B', and the linking proteins is AB as above). 

The error can also be estimated as l-J, where Tis the mean percent of 
potential tme positives calculated for all domain pairs in an organism. For each domain pair 

15 linked by a Rosetta Stone protein, there are n proteins with the first domain but not the 
second, and m proteins with the second domain but not the first The percent of true 
positives T is therefore estimated as the smaller of n or m divided by n times m. As this error 
T can be calculated for each set of linked domains, it can describe the confidence in any 
particular predicted interactioiL 

20 In addition, the error in functional links can be caused by small conserved 

regions or repeated common amino acid sequences being repeatedly identified in a '^Rosetta 
Stone" protein by a plurality of distinct non-homologous polypeptides. To reduce this error 
the percent of identity between the "Rosetta Stone" and the distinct non-homologous 
polypeptide can be measured. Alignment percentages of about 50% to about 90%, or, 

25 alternatively, about 75%, between the "Rosetta Stone" and the distinct polypeptide are 
indicative of links that are not subject to the small peptide sequence. 

Phylogenetic Pathway Method 

The ^'phylogenetic profile" method compares protein sequences across all 
known genomes and analyzes the pattern of inheritance of each protein across the different 
30 organisms. In its simplest form, each protein is simply characterized by its presence or 

absence in each organism. For example, if there are 16 known genomes, then each protein 
may be assigned a 16-bit code or phylogenetic profile. Since proteins that function together 
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(e g. in the same metabolic pathway or as part of a larger functional or structural complex) 
evolve in a correlated feshion, they should have the same or simUar patterns of inheritance, 
and therefore similar phylogenetic profiles. Therefore, the function of one protein may be 
inferred from the function of another protein, which has a similar profile, if its fimction is 
known. As with the Rosetta Stone method, the fimction of one protein is inferred from the 
function of another protein which is dissimilar in sequence. Furthermore, the predicted hnk 
between the proteins has utility in developing, for example, drug targets, diagnostics and 
therapeutics. 

The phylogenetic profile method can be implemented in a binary code (/.g., 
describing the presence or absence of a given protein in an organism) or a continuous code 
that describes how simUar the related sequences are in the different genomes. In addition, 
grouping of SimUar protein profiles may be made wherein similar profiles are indicative of 
fimctionally related proteins. Furthermore, the requirements for similarity can be modified 
depending upon particular criteria by varying the difference in similar bit requirements. For 
example, criteria requiring that the degree of similarity in the profile include all 16 bits bemg 
identical can be set. but may be modified so that similarity inl 5 bits of the 1 6 bits would 
indicate relatedness of the protein profiles as well. Statistical methods can be used to 
determine how simUar two patterns must be in order to be related. 

The phylogenetic profile method is appUcable to any genome includmg. e.g., 
viral bacterial, archaeal or eukaryotic. The method of phylogenetic profile grouping 
provides the prediction of function for a previously uncharacterized protein(s). THe method 
also allows prediction of new fimctional roles for characterized proteins based upon 
functional linkages. It also provides potential infonnative comiections (U., Imks) between 

uncharacterized proteins. 

To represent the subset of organisms that contain a homolog a phylogenetic 

profile is constnicted for each protein. The simplest manner to represent a protein's 

phylogenetic history is via a binary phylogenetic profile for each protein. This profile is a 

string with N entries, each one bit. where N corresponds to the number of genomes. The 

number of genomes can be any number of two or more ie.g., 2, 3, 4, 5. 10. 100. to 1000 or 

, more). The presence of a homolog to a given protein in the n"^ genome is indicated with an 

entry of unity at the position {e.g., in a binary system an entry of 1). If no homolog is 
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found the entry is zero. Proteins are clustered according to the similarity of their 
phylogenetic profiles. Similar profiles show a correlated pattern of inheritance, and by 
implication, functional linkage. The method predicts that the fimctions of uncharacterized 
proteins are likely to be similar to characterized proteins within a cluster. 

5 In order to decide whether a genome contains a protein related to another 

particular protein, the query amino acid sequence is aligned with each of the proteins from 
the genome(s) in question using known alignment algorithm (see above). To determine the 
statistical significance of any alignment score, the probability,;?, of obtaining a higher score 
when the sequences are shuffled is described. One way to compute ^.p value threshold is to 

10 first consider the total number of sequence comparisons that are being aligned. If there are N 
proteins in a first organism's genome and Af in all other genomes this number is M If 
this number were compared to random sequences it would be expected that one pair would 

yield a p value of _1_ . This value can be set as a threshold. Other thresholds may be used 
^ NM 

and will be recognized by those of skill in the art. 

^5 A non-binary phylogenetic profile can be used. In this method, the 

phylogenetic profile is a string of ;V entries where the entry represents the evolutionary 
distance of the query protein to the homolog in the genome. To define an evolutionary 
distance between two sequences an alignment between two sequences is performed. Such 
alignments can be carried out by any number of algorithms known in the art (for examples, 

20 see those described above). The evolution is represented by a Markov process with 

substitution rates, over a fixed interval of time, given by a conditional probability matrix: 

p{aa —* aa") 

where aa and aa ' are any amino acids. One way to construct such a matrix is to 
25 convert the BLOSUM62 amino acid substitutions matrix (or any other amino acid 

substitution matrix, e.g., PAMIOO, PAM250) from a log odds matrix to a conditional 
probability (or transition) matrix: 

BLOSUM62u 
PBii-*j)=pQ')2 2 

30 _,^-) is the probability that amino acid i wiU be replaced by amino acidy through 

point mutations according to the BLOSUM62 scores. Thcp/s are the abundances of amino 
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acid j and are computed by solving the 20 linear equations given by the normalization 
conditions that: 

2 W = ! • (2) 

i 

5 The probability of this process is computed to account for the observed 

alignment by taking the product of the conditional probabilities for each aligned pair 

/>(p)=]^;7(afl» ofl'-) (3) 

fi 

A family of evolutionary models is then tested by taking powers of the 
10 conditional probability matrix: p '=p\aa-^aa The power a that maximized P is defined to 

be the evolutionary distance. 

Many other schemes may be imagined to deduce the evolutionary distance 
between two sequences. For example, one might simply count the number of positions in the 
sequence where the two proteins have adapted different amino acids, 
,5 Although the phylogenetic history of an organism can be presented as a vector 

(as described above), the phylogenetic profiles need not be vectors, but may be represented 
by matrices. This matrix includes all the pair wise distances between a group of homologous 
protein, each one from a different organism. Similarly, phylogenetic profiles could be 
represented as evolutionary trees of homologous proteins. Functional proteins could then be 
20 clustered or grouped by matching similar trees, rather than vectors or matrices. 

In order to predict fimction, different proteins are grouped or clustered 
according to the similarity of their phylogenetic profiles. Sinular profiles indicate a 
correlated pattern of inheritance, and by implication, ftmctional linkage. 

Grouping or clustering may be accomplished in many ways. The simplest is 
25 to compute the Euclidean distance between two profiles. Another method is to compute a 
correlation coefficient to quantify the similarity between two profiles. All profiles within a 
specified distance of the query profile are considered to be a cluster or group. 

Typically a genome database will be used as a source of sequence 
information. Where the genome database contains only the nucleic acid sequence that 
30 sequence is translated to an amino acid sequence in fiame (if known) or in all frames if 

unknown. Direct comparison of the nucleic acid sequences of two or more organisms may 

be feasible but will likely be more difficult due to the degeneracy of the genetic code. 

18 



BNSOOCIO: <W0 0135317A1 I > 



wo 01/35317 PCT/US00,311S2 

Programs capable of translating a nucleic acid sequence are known in the art or easily 
programmed by those of skill in the art to recognize a codon sequence for each amino acid. 

The phylogenetic profile provides an indication of those proteins in each of the 
at least two organisms that share some degree of homology. Such a comparison can be done 
5 by any number of alignment algorithms known in the art or easily developed by one skilled 
in the art (see, for example, those Usted above, e.g., BLAST, FASTA etc.) In addition, 
thresholds can be set regarding a required degree of homology. Each protein is then grouped 
at 224 with related proteins that share a similar phylogenetic profile using grouping 
algorithms. 

10 "FunctionaUy-, Structurally- or MetaboUcally- Linked" Method 

The "physiologic linkage" method is a computational method that detects (i.e., 
identifies) proteins, and the genes that encode them, that participate in a common functional 
pathway (e.g., cell motility or cell division), that participate in the synthesis of the same or a 
similar stractural complex (e.g., a cell wall) or participate in the same or similar metabolic 

1 5 pathway (e.g., glycolysis, lipid synthesis, and the like). Proteins within these common 
functional pathway groups are examples of *'fimctionally linked" proteins. Having a 
common functional "goal" they evolve in a correlated fashion. Thus, "homologs" in 
different organisms can be comparatively identified. While these detection methods are very 
effective in identifying functional homologues in the same subset of organisms, functional 

20 linkages can be made between widely genetically disparate organisms. 

In one aspect, metabolic pathways are defined as links between proteins that 
operate in the same metabolic pathway that can be identified by sequence identity searching, 
e.g., by performing a BLAST search to find top-scoring polypeptides with high similarity 
(BLAST alignment E-value < 10'^°) to polypeptides identified in a known pathway. For 

25 example, M. tuberculosis proteins were so analyzed against E. coli proteins; MTB proteins 
whose E. coli homologs (i.e., having high similarity by BLAST alignment) act adjacently in 
metaboUc pathways as defined in the EcoCyc database (see, e.g., Kaip (1998) Nucleic Acids 
Res. 26:50-53) were identified. 

In another example, flagellar proteins are found in bacteria that possess 

30 flageUa but not in other organisms. Accordingly, if two proteins have homologs in the same 
subset of fully sequenced organisms, they are likely to be functionally linked. The methods 
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of the invention use this concept to systematically map links between all the proteins coded 
by a genome. 

Typically, functionally linked proteins have no amino acid sequence similarity 
with each other and, therefore, cannot be linked by conventional sequence aUgnment 
5 techniques. Accordingly, the methods of the invention identify drug targets that could not be 
identified using conventional sequence comparison (i.e., sequence homology or sequence 

identity) techniques. 

Prediction of functionally linked proteins by the "phylogenetic method" can 
also be used in conjunction with the "domain fiision" or "Rosetta Stone" method and also can 

1 0 be filtered by other methods that predict functionally linked proteins, such as the protein 

phylogenetic profile method or the analysis of correlated mRNA expression patterns. It was 
found that filtering by these two methods for the Rosetta Stone prediction for S. cerevisiae, 
that proteins predicted to be functionally linked by two or more of these three methods were 
as likely to be functionally related as proteins that were observed to physically interact by 

1 5 experimental techniques like yeast 2-hybrid methods or co-immunoprecipitation methods. 

For example, a combination of these methods of prediction can be used to 
establish links between proteins of closely related function. The methods of the invention 
(i.e., the "Rosetta Stone" metiiod and the "phylogenetic profile" method) can be combined 
with one another or with other protein prediction methods known in the art; see, for example, 

20 Eisen (1998) "ClustCT analysis and display of genome-wide expression partners," Proc. Natl. 
Acad Sci. USA, 95:14863-14868. 

The various techniques, methods, and variations thereof described can be 
implemented in part or in whole using computer-based systems and methods. Additionally, 
computer-based systems and methods can be used to augment or enhance the functionality 

25 described above, increase the speed at which the fimctions can be performed, and provide 
additional features and aspects as a part of or in addition to those of the invention described 
elsewhere in this document Various computer-based systems, methods, and 
implementations in accordance with this technology are described herein. 

Proteins linked to current drug targets 

30 The invention also provides a novel method for identifying a polyi>eptide, or 

the nucleic acid sequence that encodes it, that is a target for a drug. The method analyzes the 
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functional relationship between at least two sequences, wherein at least one of the sequences 
is a known target of a drug or encodes a polypeptide drug target The method comprises 
identifying proteins, and the genes that encode them, that are functionally linked to the 
targets of known drugs. The fiinctional linkage is determined by using the "domain fusion" 
5 method, the "phylogenetic profile" method or the "physiologic linkage" method, or a 
combination thereof, as described herein. 

Thus, this aspect of the invention provides methods identifying drug targets 
from among all or a subset of genes in a genome using computationally-determined 
functional linkages. In one implementation of the method, functional linkages are calculated 
10 using the "domain fusion" method, the "phylogenetic profile" method or the ♦'physiologic 
linkage" method, or a combination thereof; between all "query genome genes." Next, each 
set of genes predicted to be functionally linked to either a known drug target or to a sequence 
homolog or ortholog (defined below) to a known drug target are examined. These proteins 
(and the nucleic acids that encode them) are functionally linked to known drug targets; thus, 
15 they are operating in the same pathways or systems targeted by the known drug. 
Accordingly, the methods of the invention have identified them as drug targets. 

This method is particularly effective for identifying drug targets in pathogens, 
such as microorganisms, e.g.. bacteria, viruses and the like. This method allows for the 
identification of novel drug targets that cannot be identified by other techniques, such as 
20 traditional sequence homology or sequence identity comparison techniques. Several known 
drug targets in M. tuberculosis were used with the methods of the invention to use fimctional 
linkages to identify potential new drug targets in the same pathways as the known drug 
targets. 

There are very few drugs that are effective for anti-tuberculosis therapy, since 
25 the complex lipid-rich mycobacterial ceU wall is impermeable to many antibacterial agents. 
Additionally, single- and multi-drug resistance is rapidly emerging against these drugs. To 
address this issue, the methods of the invention were used to identify Mycobacterium 
tuberculosis (MTB or TB) proteins that are functionally linked to the targets of known drugs. 
Inhibiting these proteins should have the same effect on the organism as the drug, since the 
30 same processes or pathways would be disrupted. Targeting multiple components of a given 
biochemical pathway would also diminish the opportunity for the development of resistance 
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because various related proteins would have to mutate against inhibitors vAule preserving the 
overall functionality of the pathway. 

A list of targets of essential anti-TB drugs (World Health Organization, 
Geneva, Switzerland) was compiled. The anti-TB drugs included isoniazid, rifampicin, 
5 ethambutol, streptomycin, pyrazinamide and thiacetazone. Although not enough is known 
about the molecular basis of action of the latter two, the functional linkages of the known 

drug targets was examined. 

Isoniazid. This is one of the most widely used of all anti-tuberculosis drugs. 
It is believed that the compound is activated by the catalase-peroxidase KatG. Once 

10 activated, it then attaches to a nicotinamide adenine dinucleotide bound to the enoyl-acyl 
carrier protein reductase InhA, resulting in the inhibition of mycolic acid biosynthesis 
Rozwarski (1998) Science 279:98-102. 

Using the "phylogenetic profile, the inhA gene was "linked," or functionally 
associated with, to two polyketide synthases, pksl and pks6 (Figure 1), both of which contain 

15 acyl carrier protein motifs. The polyketide synthase pks6 is in turn known from established 
metabolic pathways to be linked to fetty acid biosynthesis gene accD3. Further, pks6 is 
linked to fadD28 and to the operon containing the genes pps A-E, all recently reported to be 
crucial for bacterial replication in host lungs (see, e.g.. Cox (1999) Nature 402:79-83). 

The inhA gene was also linked to an operon encoding two putative 

20 oxidoreductases and a gene of entirely unknown function. The inhA gene was fiuther linked 
to a second operon that includes pepR and gpsl. PepR is a protease whose Bacillus subtilis 
homolog is adjacent to the genes coding for enzymes that synthesize diaminopimelate, a 
component of the cell wall incorporated by the murE gene product and diaminopicolinate 
(see, e.g., Chen (1993) J. Biol. Chem. 268:9448-9465). PepR is an ortholog of an essential 

25 yeast gene and is likely to be essential for MTB (see below). Gpsl is a putative 
multifunctional enzyme involved in guanosine pentaphosphate synthesis and 
polyribonucleotide nucleotidyltransfer. The high reUabiUty of the predicted functional link 
between gpsl and pepR and the absence of eukaryotic homologs suggests that gpsl could be a 
promising target for drug design. 

30 Rifampicin. This compound, along with the related rifabutin and KRM- 1 648 

are believed to act by directly targeting the RNA polymerase p-subunit (rpoB) given that 
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96% of resistant isolates were foxind to have mutations of various types in a limited region of 
the rpoB gene (see, e.g., Yang (1998) J. Antimicrob. Chemother. 42:621-628). 

Using the methods of the invention, as expected, functional linkages were 
found to another RNA polymerase subunit, rpoC, as well as to various tRNA synthases and 
ribosomal proteins. However, no functional links to uncharacterized proteins were found. 

Ethambutol This drug is effective against tuberculosis when used in 
combination with isoniazid. It is believed that the drug interacts with the EmbB protein, a 
probable arabinosyl-transferase, inhibiting the biosynthesis of arabinan, a component of cell- 
envelope lipids. As with rifampicin, the evidence for this interaction is indirect, since 
mutations in the embB gene are responsible for ethambutol resistance (see, e.g., Lety (1997) 
Antimicrob. Agents Chemother. 41:2629-2633). 

The "gene proximity" method correctly clusters embB with embA (Rv3794). 
This cluster is linked to a set of mostly uncharacterized genes by the **phylogenetic profile" 
method; see Figure 2, which shows an analysis of EmbB, the target for the anti-tuberculosis 
drag Ethambutol, and shows functional linkages to genes mostly of unknown function but 
with some indications of localization at the bacterial membrane. 

Two of the uncharacterized genes, Rvl706c and RvlSOO, belong to the 
abundant PE/PPE family of proteins hypothesized to be a source of antigenic variation with 
the j>otential ability to interfere with inmiune responses by inhibiting antigen processing (see, 
e.g.. Cole (1998) Nature 393, 537-544). A third uncharacterized gene, Rvl967 belongs to the 
one of the four copies of the mce operon. This operon consists of eight genes coding for 
integral membrane proteins and proteins that have N-terminal signal sequences or 
hydrophobic segments and are believed to be involved in pathogenicity (see, e.g.. Cole 
(1998) supra). Rv0528 codes for a hypothetical membrane protein and Rv2159c corresponds 
to the murF gene, which participates in the biosynthesis of peptidoglycan precursors. 

The majority of the "links," or functionally associated sequences, involved 
proteins associated with processes related to the bacterial cell wall (with the possible 
exception of atsA and the putative choline dehydrogenase Rvl279, whose relationship to 
these processes is not immediately obvious). The proteins of unknown function are therefore 
also expected to play some role in these processes and are thus of interest as potential drag 
targets. 
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Streptomycin. This drug acts by binding to the 16S rRNA and inhibits protein 
synthesis. Resistance to this compound emerges from mutations in the corresponding gene 
(rrs), as well as in the gene encoding for the ribosomal protein S12 (rpsL). Disn^tions to 
I^sL effect streptomycin resistance by altering the higher order structure of 16S rRNA (see, 

5 e.g., Sreevatsan (1996) Antimicrob. Agents Chemother. 40:1024-1026). 

Although streptomycin doesn't directly target RpsL, the functional links 
generated for this protein was examined, as any target whose inhibition will ultimately 
disrupt bacterial protein synthesis is likely to be an effective antigrowth/ anti-microbial 
target. As with the rifampicin target, the only functional linkages found for this protein were 

1 o the expected protein synthesis-related proteins, including large ribosomal subunit proteins 
L2, L5, Lll, and L14; small ribosomal subunit proteins S4, S5, S7, S8, and Sll; elongation 
factors fiisA and Ef-Tu; the chaperones GroEL, clpB and flsH; and the Clp protease subunits 
clpC and clpX. 

Proteins linked to cell-wall related proteins 

^ 5 The invention also provides a novel method for identifying a nucleic acid or a 

polypeptide sequence in an organism that is linked to a cell-wall related protein. The method 
analyzes the functional relationship between at least two sequences, wherein at least one of 
the sequences is a cell-wall related protein, or, the sequence is a nucleic acid sequence that 
encodes a cell-wall related protein. The method comprises identifying proteins, and the 

20 genes that encode them, that are functionally linked to a cell-wall related protein. The 

functional linkage is determined by using the "domain fusion" method, the "phylogenetic 
profile" method or the "physiologic linkage" method, or a combination thereof, as described 
herein. 

Approximately eleven M. tuberculosis proteins are indicated by sequence 
25 homology to be peniciUin-binding proteins, thought to synthesize peptidoglycan in the course 
of cell elongation and cell wall metabolism (see, e.g., Broome-Smith (1985) Eur. J. Biochem. 
147:437-446). Using the methods of the invention, the functional linkages found for these 
proteins map out many of the known cell wall synthetic enzymes and reveal more than 10 
proteins of unknown fimction that may also participate in cell wall metabolism. Figure 3 
30 shows an analysis of five of the approximately eleven MTB proteins presumed to bind 
penicillin to reveal functional linkages to various potential operons consisting of genes 
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involved in various aspects of cell wall metabolism, including cell shape determination and 
peptidoglycan biosynthesis, as well more than ten genes of unknown function, v^^ch we can 
now associate with cell wall metabolism. 

Three of the proteins (pbpA, pbpB, and ponAl) reside in conserved gene 
clusters, presumably operons. Other genes in the clxisters aroxmd pbpA and pbpB are also 
implicated in cell wall metabolism. For example, pbpA resides next to rodA, a membrane- 
associated protein whose £. coli homolog determines cell sh^ and is required for enzymatic 
activity of penicillin binding proteins (see, e.g., Matsuzavwi (1989) J. Bacteriol. 171 :558- 
560). Likewise, pbpB resides next to six peptidoglycan biosynthesis genes and the two 
septum and cell wall formation proteins ftsW and ftsZ. 

Two additional gene clusters were linked to these penicillin binding proteins 
by either the "phylogenetic profile** or "Rosetta Stone" pattern methods of the invention. 
One cluster is composed of the peptidoglycan synthetic protein murB and a putative 
membrane protein of unknown function that the functional linkages suggest is involved in 
cell wall metabolism. The second gene cluster contains four genes, three of which are 
predicted to reside in the cell membrane or envelope. Therefore, the uncharacterized genes 
in these clusters are likely to be involved in cell wall metabolism, closely related to the 
function of the penicillin binding proteins and are therefore promising dmg targets. 

Another gene linked to cell wall metabolism by the computationally-derived 
linkage methods of the invention is gcpE, see Figure 4, which shows that the uncharacterized 
gene gcpE, known to be essential for bacterial sxu^val (see, e.g.. Baker (1992) FEMS 
Microbiol. Lett. 73:175-180), is predicted to be involved in ceU wall metabolism through its 
functional links to a putative membrane protein and two murein hydrolase genes, lytBl and 
lytB2, involved in cell separation. The genes forming a putative operon with gcpE are 
proposed as potential drug targets. The functional linkages place gcpE in a conserved gene 
cluster with two genes of xmknown function, one of which encodes a membrane protein. 
However, the three genes show correlated inheritance with two homologs of lytB, an E. coli 
gene involved in penicillin tolerance (see, e.g, Gustafson (1993) J. Bacteriol. 175:1203-1205) 
and recently shown to encode a murein hydrolase essential for cell separation (see, e.g., 
Garcia (1999) Mol. Microbiol. 31:1275-1277). The uncharacterized proteins from this 
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cluster are therefore expected to participate in processes simUar to GcpE and might therefore 
be promising drug targets. 

Proteins linked t potentially novel pathways 

The invention also provides a novel method for identifying a polypeptide, or a 

5 nucleic acid that encodes it, that is linked to potentially novel biochemical (e.g., biosynthetic, 
metabolic) pathways. The method analyzes the functional relationship between at least two 
sequences, wherein at least one of the sequences is associated wdtii a biochemical pathway, 
such as a pathway in a microorganism that enables the pathogen to evade an immune process. 
The method comprises identifying proteins, and the genes that encode them, that are 

10 functionally linked to the pathway-linked sequences. The functional linkage is deteimmed 
by using the "domain fusion" method, the "phylogenetic profile" method or the "physiologic 
linkage" method, or a combination thereof, as described herein. 

For example, the htrA gene encodes for a putative heat shock protein 
homologous to HtrA from Salmonella typhimurium, a serine protease that degrades aberrant 

15 periplasmic proteins. Mutations in this protein have been linked with reduced viability in 
host macrophages (see, e.g., Johnson (1991) Mol. Microbiol. 5:401-407). Thus, it was 
decided to investigate the function of htrA. Using the methods of the invention, results 
indicated that the htrA protein is part of a process that has not yet been characterized. The 
gene is predicted with very high reliability to fiinction with the uncharacterized gene 

20 Rvl224c, see Figure 5, which shows Ae involvement of htrA in a potentially novel pathway 
and the gene encoding the putative heat shock protein HtrA is functionally linked to a set of 
genes mostly of unknown function, suggesting the existence of a novel pathway. The 
partially characterized proteins suggest that the pathway relates to membrane-associated 
processes such as signaling and/or transport The lack of eukaryotic homologs for most of 

25 the genes linked to htrA, suggests that proteins of this pathway could be promising drug 
targets. 

Through its phylogenetic profile, htrA is linked to a group of uncharacterized 
proteins, including a putative Upid esterase (Rvl900c), an ABC transporter (Rv3783) and the 
uncharacterized protein Rvl216c, which has weak homology to the laminin B receptor of 
30 Xenopus laevis, suggesting that it might be a membrane protein. From this analysis, it can be 
concluded that htrA is part of a novel pathway that involves membrane-associated processes, 
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such as signaling and/or transport. Because the majority of the proteins linked to htrA have 
no exikaryotic homologs, and given the importance of htrA in 5. typhimurium pathogenesis, 
this pathway represents another potential source of novel targets for anti-tuberculosis drugs. 

Proteins linked to essential proteins 

5 The invention also provides a novel method for identifying a polypeptide, or 

the nucleic acid sequence that encodes it, that is linked to an essential protein (e.g., a protein 
necessary for the growth of an organism, such as a bacterium). The method analyzes the 
functional relationship between at least two sequences, wherein at least one of the sequences 
is linked to an essential protein, or, the sequence is a nucleic acid sequence that itself is 

10 essential or encodes a polypeptide linked to an essential protein. The functional linkage is 
determined by using the "domain fusion" method, the '*phylogenetic profile" method or the 
^'physiologic linkage" method, or a combination thereof, as described herein. 

For example, the MIPS database (Mimich Information Center for Protein 
Sequences; MIPS provides access through its WWW server to a spectrum of generic 

15 databases, including PEDANT, MYGD, MATD, MEST, the PIR-Intemational Protein 

Sequence Database, the protein family database PROTFAM, the MTTOP database, and the 
all-against-all FASTA database; see, e.g., Mewes (1999) Nucleic Acids Res. 27:44-48) 
contains a list of 734 genes that are essential for Saccharomyces cerevisiae viability (see, 
e.g., Mewes (1999) supra). A list of Mycobacterium tuberculosis genes orthologous to these 

20 essential genes was generated. Using the methods of the invention, 60 such genes were 
found. The products of these genes have a high likelihood of also being essential to the 
tubercxilosis bacterium and therefore could be promising therapeutic targets. Furthermore, 
since the list of essential genes came from a eukaryote, there is a significant chance that these 
genes would also be foxmd in the human genome. 

25 Automatic Method to Identify Drug Targets from Functional Linkages 

One aspect of the invention provides a computational method to identify 
potential drug targets among the proteins expressed by a genome. This aspect takes 
advantage of the functional linkages calculated between genes in a genome using the 
methods described herein, as well as the detection of sequence homology and the knowledge 

30 of a set of lethal or "essential" genes in one or more organisms. 
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To identify drug targets in a query genome, the sequence homology between 
all of the genes in that genome and all of the genes in the genome of an organism for which 
essential genes are known is calculated. For example, as discussed herein, the query genome 
is Mycobacterium tuberculosis (TB) and the genome with known essentials is the yeast S. 

5 cerevisiae. Sequence homology between all TB genes and all yeast genes was calculated 
using the methods of the invention. 

"Equivalent" or "orthologous" genes were also identified by another aspect of 
the invention that comprises doing a reverse sequence search (e.g., yeast vs. TB) and then 
choosing pairs of genes that are the symmetric best-scoring sequence search. In one 

10 exemplary aspect, MTB orthologs of Saccharonryces cerevisiae genes were generated by 
finding all pairs of genes (TBi,SCj) where TBi was the top hit from a BLAST search of the 
yeast gene SCj against the MTB genome, SQ was the top hit from a BLAST search of the 
MTB gene TBj against the Saccharonryces cerevisiae genome and both top hits had a 

BLAST E-value <= 1x10*'. 
^ 5 For example, a TB gene is an ortholog of a yeast gene if the yeast gene is the 

best scoring sequence match when yeast is searched with the TB gene, and the TB gene is the 
best scoring sequence match when TB is searched with the yeast gene. We define these 
"symmetric" pairs as "orthologs." 

After identifying orthologs between the query genome and the genome with 
20 known essential genes, a set of query genome genes that are orthologs of known essential 
genes in the other genome was chosen. These genes were designated the set of "putative 
essentials". For the purposes of the algorithm of the invention, these query genome genes are 
assumed to be essential genes, since they are the equivalents of essential genes in another 
genome. These genes act as "markers" or indicators of essential pathways in the query 
25 genome. One could supplement this set with genes already known to be essential in the 

query organism. Functional linkages (determined by the methods of the invention) between 
aU query genome genes were examined. The query genome genes linked to all of the 
putative essential genes were examined. This set of genes was designated as the "predicted 
members of essential pathways." These genes are likely to be involved in important 
3b pathways, since the (predicted) pathways have members that are putative essentials. Lastly, 
the method removes from the set of genes in predicted essential pathways all of those genes 
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that have sequence homology to eukaryotic genes or proteins. The genes that remain after 
this filtering step are the predicted drug targets for the query organism. 

As a benchmark, this method was applied to the M, tuberculosis genome. Of 
the over 3900 genes in TB, 1 1 were identified as potential drug targets. Comparing this list 

5 of 11 predicted targets to the less than 1 0 known drug anti-TB drug targets, one gene was a 
known drug target and one was linked to a known drug target. Accordingly, the algorithm of 
the invention performed statistically significantly much better than a random choice of genes. 
A rough estimate of statistical significance suggests that one would expect to see 2 of 1 0 
known drug targets in a sample of 1 1 out of 3900 genes only 3.8 times out of 10,000 trials 

10 (probability of occurring by random chance of 3.8 x 10^. Therefore, tiiis embodiment of the 
method is an entirely computational algorithm drawing on the demonstrated ability of the 
general methods of the invention to predict fimctional linkages between genes and to 
effectively identify dmg targets in bacteria. The effectiveness of this method to identify 
novel drug targets was clearly demonstrated when the algorithm was applied to flie M. 

15 tuberculosis genome. 

The specific inhibition of the MTB homologs might be difficult To address 
this issue, xising the methods of the invention, functional links to the essential genes were 
searched. Fimctional links were selected which either do not have homologs in yeast, or the 
enzymatic activity of their products are known to be absent in human cells. Using the 

20 highest confidence data, functional links for 23 of the genes (indicated in bold in Table 1) 
were foimd. 
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Eight of these were linked to 12 unique MTB genes that satisfied the criteria 
of the invention's methods (Table 1). Exemplary findings include: 

(1) the gene folP, which encodes the enzyme dihydropteroate synthase 
(DHPS) known to be the target of sulfonamide antibacterial drugs. Although it is found in 
some eukaryotes, DHPS activity is not found in human cells (see, e.g., Huovinen (1995) 
Antimicrob. Agents Chemother. 39:279-2890. 

(2) the product of the gene folK, a 7,8-dihydro-6-hydroxymethyl- 
pterinpyrophosphokinase, has recenUy been proposed as a target for broad-spectrum 
antibacterial drugs (see. e.g.. Stammers (1999) FEBS Lett. 456:49-53). 

(3) the gene gpsi, is not only strongly linked to the essential yeast gene pepR, 
but it is also functionally linked to inhA, the target of the drug isoniazid (see above), making 
it a very compelling candidate for drug design. 
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Table 2. Subset of genes from Table 1 that are functionally linked t genes with ut 
yeast homologs. 



Gene 


Link^ 


C mments 




Etv0002 
ewOUU J 

Rvoooe 


dnaN DNA polymerase III, p-subunit 
T-^^p DNA reDlication and SOS induction 
gyr A DNA gyrasc subunh A 


Rv0350 


RvOBSl 

Rv0352 


grpE stimulates DnaK ATPase activity 

dna J acts with GrpE to stimulate DnaK ATPase 


bttI n 1 n 
n V xu X\J 


RvlOOo 
RV10U9 
RvlOll 


Similar to E, coii hypothetical protein YcfH 

Possible lipoprotein, similar to various o^er MTB proteins 

Similar to E.coli hypothetical protein YcbH 


Rv2439c 


Rv2427c 
Rv2440c 
Rv2441c 
RV2442C 


proA y-ghitamyl phosphate reductase 
obg Obg GTP-binding protein 
rpmA 50S ribosomal protein L27 
Ti riHovirnal nrotein L21 


Rv2782c 


Rv2783c 


gps I pppGpp synthase and polyribonucleotide phosphorylase 


Rv3598c 


Rv3600c 
Rv3606c 
Rv3607c 
Rv3608c^ 

Rv3610c 


similar to Bacillus jt/6ft7« hypothetical protein YacB 
f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
f o 1 P dihydropteroate synthase (DHPS) 
r^sH umer memorane proieuiy cnapcrouc 


Rv3608c 


Rv3598c 
Rv3600c 
Rv3606c 
Rv3607c 
Rv3609c 
Rv3610c 


1 y s S lysyl-tRNA syn Aase 

similar to Bacillus 5«i>n7ty hypothetical protein YacB 
f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
f olE GTP cyclohydrolase 1 
f t s H inner membrane protein, chaperone 


Rv3609c 


Rv3606c 
Rv3607c 
Rv3608c^ 


f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
folP dihydropteroate synthase (DHPS) 



^ Geaes without yeast homologs shown in boldrace 



* DHPS activity is found in some eukaiyotic cells but not in human cells 

In summary, the methods of the invention allowed identification of this 
combination of functional linkages to essential genes. This information, together with the 
lack of eukaiyotic homologs for these genes, makes this group of proteins promising drug 
targets, particularly because their inhibition is expected to disrupt vital bacterial processes 
with a low likelihood of toxicity from the inhibition of a host equivalent. 
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10 



Computer Implementation 

The various techniques, methods, and aspects of the invention described 
herein can be implemented in part or in whole using computer-based systems and methods. 
Additionally, computer-based systems and methods can be used to augment or enhance the 
functionalities and algorithms described herein, increase the speed at which the functions can 
be performed, and provide additional features and aspects as a part of or in addition to those 
of the invention described elsewhere in this document Various exemplary computer-based 
systems, methods and implementations in accordance with the above-described technology 

are presented herein. 

The processor-based system can include a main memory, such as a random 
access memory (RAM), and can also include a secondary memory. The secondary memory 
can include, for example, a hard disk drive and/or a removable storage drive, representing a 
floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage 
drive reads fix)m and/or writes to a removable storage medixmi. Removable storage media 
15 can be a floppy disk magnetic tape, an optical disk, and the like, w*ich can be read by and 
written to by removable storage drive. The removable storage media can includes a 
computer visable storage medium having stored therein computer software and/or data. 

In alternative embodiments, secondary memory may include other similar 
means for allovkdng computer programs or other instructions to be loaded into a computer 
20 system. Such means can include, for example, a removable storage unit and an interfece. 

Examples of such can include a program cartridge and cartridge interface (such as the found 
in video game devices), a movable memory chip (such as an EPROM, or PROM) and 
associated socket, and other removable storage units and interfaces that allow software and 
data to be transferred from the removable storage unit to the computer system. 
25 The computer system can also include a commimications interface. 

Commimications interfaces allow software and data to be transferred between computer 
system and external devices. Examples of communications interfaces include modems, 
network interfaces (such as, for example, an Ethernet card), communications ports, PCMCIA 
slots and cards, and the like. Software and data transferred via a communications interface 
30 can be in the form of signals that can be electronic, electromagnetic, optical or other signals 
capable of being received by a communications interface. These signals can be provided to 
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communications interface via a channel capable of canying signals and can be implemented 
using a wireless medium, wire or cable, fiber optics or other communications medium. Some 
examples of a channel can include a phone line, a cellular phone link, an RF link, a network 
interface, and other communications channels. 

As used herein, the terms "computer program medium" and "computer usable 
medium" are used to generally refer to media such as a removable storage device, a disk 
capable of installation in a disk drive, and signals on a channel, or equivalents thereof. These 
computer program products are means for providing software or program instructions to 
computer systems. Computer programs (also called computer control logic) can be stored in 
main memory and/or secondary memory. Computer programs can also be received via a 
communications interface. Such computer programs, when executed, enable the computer 
system to perform the features of the present invention as discussed herein. Computer 
programs, when executed, enable the processor to perform the features of the present 
invention. Accordingly, in one aspect of the invention, such computer programs represent 
controllers of the computer system. 

In another aspect of the invention the methods and algorithms are 
implemented using software, the software may be stored in, or transmitted via, a computer 
program product and loaded into a computer system using a removable storage drive, hard 
drive or communications interface. The control logic (software), when executed by the 
processor, causes the processor to perform the functions of the invention as described herein. 

In another aspect, the elements are implemented primarily in hardware using, 
for example, hardware components such as P ALs, application specific integrated circuits 
(ASICs) or other hardware components. Implementation of a hardware state machine so as 
to perform the functions described herein will be apparent to person skilled in the relevant 
art(s). In yet another embodiment, elements are implanted using a combination of both 

hardware and software. 

In another aspect, the computer-based methods can be accessed or 
implemented over the World Wide Web by providing access via a Web Page to the methods 
of the present invention. Accordingly, the Web Page is identified by a Universal Resource 
Locator (URL). The URL denotes both the server machine, and the particular file or page on 
that machine. In this embodiment, it is envisioned that a consumer or client computer system 

34 



0135317A1 I > 



wo 01/35317 



PCT/USOO/31152 



interacts with a browser to select a particular URL, which in turn causes the browser to send 
a request for that URL or page to the server identified in the URL. Typically the server 
responds to the request by retrieving the requested page, and transmitting the data for that 
page back to the requesting client computer system (the client/server interaction is typically 

5 performed in accordance with the hypertext transport protocol ("HTTP")). The selected page 
is then displayed to the user on the client's display screen. The client may then cause the 
server containing a computer program of the present invention to launch an application 
comprising a method of the invention, for example, to identify a nucleic acid or a polypeptide 
sequence that may be a target for a drug comprising the steps of (a) providing a first nucleic 

10 acid or a polypeptide sequence that is known to be a drug target; (b) providing an algorithm 
capable analyzing a fimctional relationship between nucleic acid or polypeptide sequences 
selected fi-om the group consisting of a "domain fusion" method, a **phylogenetic profile" 
method and a "physiologic linkage" method; and, (c) comparing the first nucleic acid or the 
polypeptide drug target sequence to a plurality of sequences using at least one algorithm to 

15 identify a second sequence that has a functional relationship to the first sequence, thereby 

identifying a nucleic acid or a polypeptide sequence that may be a target for a drug, based on 
a query sequence provided by the client 

Nucleic Acids and Polypeptides 

The invention also provides isolated nucleic acids and polypeptides 
20 comprising the sequences as set forth in Table 3 and Table 4 (below). As used herein, 

"isolated," when referring to a molecule or composition, such as, e,g,, an isolated infected 
cell comprising a nucleic acid sequence derived firom a library of the invention, means that 
the molecule or composition (including, e.g., a cell) is separated firom at least one other 
compoxmd, such as a protein, DNA, RNA, or other contaminants with which it is associated 
25 in vivo or in its naturally occurring state. Thus, a nucleic acid or polypeptide or peptide 
sequence is considered isolated when it has been isolated firom any other component with 
which it is naturally associated. An isolated composition can, however, also be substantially 
pure. An isolated composition can be in a homogeneous state. It can be in a dry or an 
aqueous solution. Purity and homogeneity can be determined, e.g., using any analytical 
30 chemistry technique, as described herein. 
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The term "nucleic acid" or "nucleic acid sequence" refers to a deoxy- 
ribonucleotide or ribonucleotide oligonucleotide, including single- or double-stranded, or 
coding or non-coding (e.g., "antisense") forms. The term encompasses nucleic acids, i.e.. 
oligonucleotides, containing known analogues of natural nucleotides. The term also 
5 encompasses nucleic-acid-like structures with synthetic backbones, see e.g.. Oligonucleotides 
and Analogues, a Practical Approach, ed. F. Eckstein. Oxford Univ. Press (1991); Antisense 
Strategies, Annals of the N.Y. Academy of Sciences, Vol 600, Eds. Baserga et al. (NYAS 
1992); MiUigan (1993) J. Med. Chem 36:1923-1937; Antisense Research and Applications 
• (1993, CRC Press), WO 97/0321 1 ; WO 96/39154; Mata (1997) Toxicol. Appl. Pharmacol. 
10 144:189-197; Strauss-Soukup (1997) Biochemistry 36:8692-8698; Samstag (1996) Antisense 
Nucleic Acid Drug Dev 6:153-156. As used herein, the "sequence" of a nucleic acid or gene 
refers to the order of nucleotides in the polynucleotide, including either or both strands (sense 
and antisense) of a double-stranded DNA molecule, e.g., the sequence of both the coding 
strand and its complement, or of a single-stranded nucleic acid molecule (sense or antisense). 
1 5 For example, in alternative embodiments, promoters drive the transcription of sense and/or 
antisense polynucleotide sequences of the invention, as exemplified by Table 3. 

The terms "polypeptide," "protein," and "peptide" include compositions of the 
invention that also include "analogs," or "conservative variants" and "mimetics" 
("peptidomimetics") with structures and activity that substantially correspond to the 
20 exemplary sequences, such as the sequences in Table 4. Thus, the terms "conservative 

variant" or "analog" or "mimetic" also refer to a polypeptide or peptide which has a modified 
amino acid sequence, such that the change(s) do not substantially alter the polypeptide's (the 
conservative variant's) structure and/or activity (e.g., immunogenicity, ability to bind to 
human antibodies, etc.), as defined herein. These include conservatively modified variations 
25 of an amino acid sequence, i.e., amino acid substitutions, additions or deletions of those 

residues that are not critical for protein activity, or substitution of amino acids with residues 
having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non- 
polar, etc.) such that the substitutions of even critical amino acids does not substantially alter 
structure and/or activity. Conservative substitution tables providing functionaUy similar 
30 amino acids are well known in the art. For example, one exemplary guideline to select 

conservative substitutions includes (original residue followed by exemplary substitution): 

36 



BNSOOCIO: <VVO 



013S317At I > 



wo 01/35317 



PCTAJSOO/31152 



ala/gly or ser; arg/ lys; asn/ gin or his; asp/glu; cys/ser, gln/asn; gly/asp; gly/ala or pro; 
his/asn or gin; ile/leu or val; leu/ile or val; lys/arg or gin or glu; met/leu or tyr or ile; phe/met 
or leu or tyr, ser/thr; thr/ser; trp/tyr; tyr/trp or phe; val/ile or leu. An alternative exemplary 
guideline uses the following six groups, each containing amino acids that are conservative 

6 substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), 
Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) 
Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine 
(Y), Tryptophan (W); (see also, e.g., Creighton (1984) Proteins, W.H. Freeman and 
Company; Schulz and Schimer (1979) Principles of Protein Structure, Springer- Verlag). One 

10 of skill in the art will appreciate that the above-identified substitutions are not the only 
possible conservative substitutions. For example, for some purposes, one may regard all 
charged amino acids as conservative substitutions for each other whether they are positive or 
negative. In addition, individual substitutions, deletions or additions that alter, add or delete 
a single amino acid or a small percentage of amino acids in an encoded sequence can also be 

1 5 considered "conservatively modified variations." 

The terms "mimetic" and *'peptidomimetic" refer to a synthetic chemical 
compoimd that has substantially the same structural and/or fimctional characteristics of the 
polypeptides of the invention (e.g., ability to bind, or "capture," human antibodies in an 
ELISA). The mimetic can be either entirely composed of synthetic, non-natural analogues of 

20 amino acids, or, is a chimeric molecule of partly natural peptide amino acids and partly non- 
natural analogs of amino acids. The mimetic can also incorporate any amoimt of natural 
amino acid conservative substitutions as long as such substitutions also do not substantially 
alter the mimetics' structure and/or activity. As with polypeptides of the invention which are 
conservative variants, routine experimentation will determine whether a mimetic is within the 

25 scope of the invention, i.e., that its structure and/or fimction is not substantially altered. 

Polypeptide mimetic compositions can contain any combination of non-natural structural 
components, which are typically from three structural groups: a) residue linkage groups other 
than the natural amide bond ('^peptide bond") linkages; b) non-natural residues in place of 
naturally occurring amino acid residues; or c) residues which induce secondary structural 

30 mimicry, i.e., to induce or stabilize a secondary structure, e.g., a beta turn, ganmia turn, beta 
sheet, alpha helix conformation, and the like. A polypeptide can be characterized as a 
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mimetic when all or some of its residues are joined by chemical means other than natural 
peptide bonds. Individual peptidomimetic residues can be joined by peptide bonds, other 
chemical bonds or coupling means, such as, e.g., glutaraldehyde, N-hydroxysuccinimide 
esters, bifimctional maleimides, N,N*-dicyclohexylcarbodiimide (DCC) or N,N'- 
5 diisopropylcarbodiimide (DIC). Linking groups that can be an alternative to the traditional 
amide bond ("peptide bond") linkages include, e.g., ketomethylene (e.g., -C(=0)-CH2- for - 
C(=0)-NH-), aminomethylene (CH2-NH), ethylene, olefin (CH=CH), ether (CH2-O), 
thioether (CH2-S), tetrazole (CNa-), thiazole, retroamide, thioamide, or ester (see, e.g., 
Spatola (1983) in Chemistry and Biochemistry of Amino Acids, Peptides and Proteins, Vol. 
10 7, pp 267-357, ^Teptide Backbone Modifications," Marcell Dekker, NY). A polypeptide can 
also be characterized as a mimetic by containing all or some non-natural residues in place of 
naturally occurring amino acid residues; non-natural residues are well described in the 
scientific and patent literature. 

The invention comprises nucleic acids comprising sequences as set forth in 
15 Table 3, or comprising nucleic acids encoding the polypeptides as set forth in Table 4, 

operably linked to a transcriptional regulatory sequence. As xised herein, the term "operably 
linked," refers to a functional relationship between two or more nucleic acid (e.^., DNA) 
segments. Typically, it refers to the functional relationship of a transcriptional regulatory 
sequence to a transcribed sequence. For example, a promoter (defined below) is operably 
20 linked to a coding sequence, such as a nucleic acid of the invention, if it stimulates or 
modulates the transcription of the coding sequence in an appropriate host cell or other 
expression system. Generally, promoter transcriptional regulatory sequences that are 
operably linked to a transcribed sequence are physically contiguoxis to the transcribed 
sequence, i.e., they are ci5-acting. However, some transcriptional regulatory sequences, such 
25 as enhancers, need not be physically contiguous or located in close proximity to the coding 

sequences whose transcription they enhance. For example, in one embodiment, a promoter is 
operably linked to an ORF-containing nucleic acid sequence of the invention, as exemplified 
by, e.g., a nucleic acid sequence as set forth in Table 3. 

As used herein, the term •'promoter" includes all sequences capable of driving 
30 transcription of a coding sequence in an expression system. Thus, promoters used in the 

constructs of the invention include crj-acting transcriptional control elements and regulatory 

38 



BNSDOCID: <WO 



0135317A1 I > 



wo 01/35317 PCT/USOO/31152 

sequences that are involved in regulating or modulating the timing and/or rate of 
transcription of a nucleic acid of the invention. For example, a promoter can be a ciy-acting 
transcriptional control element, including an enhancer, a promoter, a transcription terminator, 
an origin of replication, a chromosomal integration sequence, 5' and 3' untranslated regions, 

5 or an intronic sequence, which are involved in transcriptional regulation. These cw-acting 
sequences typically interact with proteins or other biomolecules to carry out (turn on/off, 
regulate, modulate, etc.) transcription. 

The invention comprises expression cassettes comprising nucleic acids 
comprising sequences as set forth in Table 3, or comprising nucleic acids encoding the 

1 0 polypeptides as set forth in Table 4. The temi "expression vector" refers to any recombinant 
expression system for the purpose of expressing a nucleic acid sequence of the invention in 
vitro or in vivo, constitutively or inducibly, in any cell, including prokaiyotic, yeast, ftmgal, 
plant, insect or mammalian cell. The term includes linear or circular expression systems. 
The term includes expression systems that remain episomal or integrate into the host cell 

15 genome. The expression systems can have the ability to self-replicate or not, i.e., drive only 
transient expression in a cell. The term includes recombinant "expression cassettes" which 
contain only the minimum elements needed for transcription of the recombinant nucleic acid. 

Alignment Analysis of Sequences 

The nucleic acid and polypeptide sequences of the invention include genes 
20 and gene products identified and characterized by sequence identiiy analysis (i.e., by 
homology) vising the exemplary nucleic acid and protein sequences of the invention, 
including, e.g., those set forth in Tables 3 and 4. In alternative aspects of the invention, 
nucleic acids and polypeptides within the scope of the invention include those having 98%, 
95%, 90%, 85% or 80% sequence identity (homology) to the exemplary sequences as set 

25 forth in Tables 3 and 4. 

For sequence comparison, typically one sequence acts as a reference 
sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence coordinates 
are designated, if necessary, and sequence algorithm program parameters are designated. 
30 Default program parameters are used unless alternative parameters are designated herein. 

The sequence comparison algorithm then calculates the percent sequence identity for the test 
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sequence(s) relative to the reference sequence, based on the designated or defeult program 
parameters. A "comparison window", as used herein, includes reference to a segment of any 
one of the number of contiguous positions selected firom the group consisting of from 25 to 
600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence 

5 may be compared to a reference sequence of the same number of contiguous positions after 
the two sequences are optimally aligned. Methods of alignment of sequences for comparison 
are well-known in the art Optimal alignment of sequences for comparison can be conducted, 
e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), 
by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), 

10 by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 

85:2444 (1988), by computerized implementations of these algorithms (CLUSTAL, GAP, 
BESTFTT, PASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics 
Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual 
inspection. 

^5 In one aspect of the invention (in the methods of the invention, and, to 

determine if a sequence is within the scope of the invention), a CLUSTAL algorithm is used, 
e.g., the CLUSTAL W program, see, e.g., Thompson (1994) Nuc. Acids Res. 22:4673-4680; 
Higgins (1996) Methods Enzymol 266:383-402. Variations can also be used, such as 
CLUSTAL X, see Jeanmougin (1998) Trends Biochem Sci 23:403-405; Thompson (1997) 

20 Nucleic Acids Res 25:4876-4882. In one aspect, the CLUSTAL W program described by 
Thompson (1994) supra, is used with the following parameters: K tuple (word) size: 1, 
window size: 5, scoring method: percentage, number of top diagonals: 5, gap penalty: 3, to 
determine whether a nucleic acid has sufficient sequence identity to an exemplary sequence 
to be with the scope of the invention. In another aspect, the algorithm PILEUP is used in the 

25 methods and to determine whether a nucleic acid has sufficient sequence identity to be with 
the scope of the inventioiL This program creates a multiple sequence alignment from a group 
of related sequences using progressive, pairwise alignments to show relationship and percent 
sequence identity. It also plots a tree or dendogram showing the clustering relationships used 
to create the alignment. PILEUP uses a simplification of the progressive alignment method 

30 of Feng & Doolittie, J. Mol. Evol. 35:351-360 (1987). The method used is similar to the 
method described by Higgins & Sharp, CABIOS 5:151-153 (1989). Using PILEUP, a 

40 



BNSOOCID: <VVO 



013S317A1 I > 



wo 01/35317 



PCTAJSOO/31152 



reference sequence {e.g., an exemplary GCA-associated sequence of the invention) is 
compared to another sequence to determine the percent sequence identity relationship (/.e., 
that the second sequence is substantially identical and within the scope of the invention) 
using the following parameters: default gap weight (3,00), default g^ length weight (0.10), 

5 and weighted end gaps. In one embodiment, PILEUP obtained from the GCG sequence 

analysis software package, e.g., version 7.0 (Devereaux(1984) Nuc, Acids Res, 12:387-395), 
using the parameters described therein, is used in the methods and to identify nucleic acids 
within the scope of the invention. In a another aspect, a BLAST algorithm is used (in the 
methods, e.g., to determine percent sequence identity (i.e., substantial similarity or identity) 

10 and whether a nucleic acid is within the scope of the invention), see, e.g., Altschul (1990) J, 
Mol Biol 215:403-410. Software for performing BLAST analyses is publicly available 
through the National Center for Biotechnology Information, NTH. This algorithm involves 
first identifying high scoring sequence pairs (HSPs) by identifying short words of length W 
in the query sequence, which either match or satisfy some positive-valued threshold score T 

15 when aligned with a word of the same length in a database sequence. T is referred to as the 
neighborhood word score threshold (Altschul (1990) supra). These initial neighborhood 
word hits act as seeds for initiating searches to find longer HSPs containing them. The word 
hits are then extended in both directions along each sequence for as far as the cumulative 
alignment score can be increased. Cimiulative scores are calculated using, for nucleotide 

20 sequences, the parameters M (reward score for a pair of matching residues; always > 0) and 
N (penalty score for mismatching residues, always < 0). For amino acid sequences, a scoring 
matrix is used to calculate the cumulative score. Extension of the word hits in each direction 
are halted when: the cumulative alignment score falls off by the quantity X firom its 
maximum achieved value; the cumulative score goes to zero or below, due to the 

25 accumulation of one or more negative-scoring residue alignments; or the end of either 
sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. In one embodiment, to determine if a nucleic acid 
sequence is within the scope of the invention, the BLASTN program (for nucleotide 
sequences) is used incorporating as defaults a wordlength (W) of 1 1, an expectation (E) of 

30 10, M=5, N=4, and a comparison of both strands. For amino acid sequences, the BLASTP 
program uses as default parameters a wordlength (W) of 3, an expectation (E) of 10, and the 
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BLOSUM62 scoring matrix (see, e.g., HenikofiF (1989) Proc. Natl. Acad, Sci. USA 
89:10915). 

Hybridization for Identifying Nucleic Acids of the Invention 

Nucleic acids within the scope of the invention include isolated or 

5 recombinant nucleic acids that specifically hybridize imder stringent hybridization conditions 
to an exemplary nucleic acid of the invention (including a sequence encoding an exemplary 
polypeptide) as set forth in Tables 3 and 4. Stringent conditions are sequence-dependent and 
will be different in different circximstances. Longer sequences hybridize specifically at 
higher temperatures. An extensive guide to the hybridization of nucleic acids is foimd in, 

10 e.g., Tijssen (1993) infra. Generally, stringent conditions are selected to be about 5 to 1 O^C 
lower than the thermal melting point (Tm) for the specific sequence at a defined ionic 
strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic 
acid concentration) at which 50% of the probes complementary to the target hybridize to the 
target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50^ of 

15 the probes are occupied at equilibrium). Stringent conditions will be those in which the salt 
concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion 
concentration (or other salts) at pH 7.0 to 8.3 and tfie temperature is at least about 30^C for 
short probes (e,g,, 10 to 50 nucleotides) and at least about 60°C for long probes {e.g., greater 
than 50 nucleotides). Stringent conditions may also be achieved with the addition of 

20 destabilizing agents such as fomiamide. 

For selective or specific hybridization, a positive signal (e.g,, identification of 
a nucleic acid of the invention) is about 10 times backgroimd hybridization. "Stringent" 
hybridization conditions that are used to identify substantially identical nucleic acids within 
the scope of the invention include hybridization in a buffer comprising 50% formamide, 5x 

25 SSC, and 1% SDS at 42**C, or hybridization in a buffer comprising 5x SSC and 1% SDS at 
65°C, both with a wash of 0.2x SSC and 0.1% SDS at 65*^0. Exemplary "moderately 
stringent hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 
M NaCl, and 1% SDS at 37°C, and a wash in IX SSC at 45*C. Those of ordinary skill will 
readily recognize that alternative but comparable hybridization and wash conditions can be 

30 utilized to provide conditions of similar stringency. Nucleic acids which do not hybridize to 
each other imder stringent hybridization conditions are still substantially identical if the 
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polypeptides which they encode are substantially identical. This may occur, e.g., when a 
copy of a nucleic acid is created using the maximum codon degeneracy permitted by the 
genetic code, as discussed herein (see discussion on "conservative substitutions"). However, 
the selection of a hybridization format is not critical - it is the stringency of the wash 

5 conditions that set forth the conditions that determine vj^ether a nucleic acid is within the 
scope of the invention. Wash conditions used to identify nucleic acids within the scope of 
the invention include, e.g.: a salt concentration of about 0.02 molar at pH 7 and a temperature 
of at least about 50°C or about SS^C to about 60°C; or, a salt concentration of about 0.15 M 
NaCl at 72*0 for about 15 minutes; or, a salt concentration of about 0.2X SSC at a 

10 temperature of at least about 50*»C or about 55°C to about 60°C for about 15 to about 20 
minutes; or, the hybridization complex is washed twice with a solution with a salt 
concentration of about 2X SSC containing 0.1% SDS at room temperature for 15 minutes 
and then washed twice by O.IX SSC containing 0.1% SDS at 68**C for 15 minutes; or, 
equivalent conditions. See Sambrook, Tijssen and Ausubel (see below) for a description of 

15 SSC buffer and equivalent conditions. 

General Techniques 

The nucleic acid and polypeptide sequences of the invention and other nucleic 
acids used to practice this invention, whether RNA, cDNA, genomic DNA, vectors, viruses 
or hybrids thereof, may be isolated from a variety of sources, genetically engineered, 
20 amplified, and/or expressed recombinantly. Any recombinant expression system can be 
used, including, in addition to bacterial cells, e.g., mammalian, yeast, insect or plant cell 

expression systems. 

Alternatively, these nucleic acids and polypeptides can be synthesized in vitro 
by well-known chemical synthesis techniques, as described in, e.g., Camithers (1982) Cold 
25 Spring Harbor Symp. Quant Biol. 47:41 1-418; Adams (1983) J. Am. Chem. Soc. 105:661 ; 
Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frcnkel (1995) Free Radic. Biol. Med. 
19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 
68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett 22:1859; U.S. 
Patent No. 4,458,066. 

30 Techniques for the manipulation of nucleic acids, such as, e.g., generating 

mutations in sequences, subcloning, labeling probes, sequencing, hybridization and the like 
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are well described in the scientific and patent literature, see, e.g., Sambrook, ed.. 
Molecular Clond-^g: a Laboratory Manual (2nd ed.). Vols. 1-3, Cold Spring Harbor 
Laboratoiy, (1989); Current Protocols in Molecular Biology, Ausubel, ed. John Wiley 
& Sons, Inc., New York (1997); Laboratory Techniques in Biochemistry and 
Molecular Biology: Hybridization With Nucleic acid Probes, Part I. Theoiy and 
Nucleic Acid Preparation, Tijssen, ed. Elsevier, N.Y. (1993). 

Polypeptides and peptides of the invention can also be synthesized, whole or 
in part, using chemical methods well known in the art See e.g., Caruthers (1980) Nucleic 
Acids Res. Symp. Ser. 215-223; Hom (1980) Nucleic Acids Res. Symp. Ser. 225-232; 
Banga, A.K., Therapeutic Peptides and Proteins, Formulation, Processing and Delivery 
Systems (1995) Technomic Publishing Co., Lancaster, PA. For example, peptide synthesis 
can be performed using various solid-phase techniques (see e.g., Roberge (1995) Science 
269:202; Merrifield (1997) Methods Enzymol. 289:3-13) and automated synthesis may be 
achieved, e.g., using the ABI 431 A Peptide Synthesizer (Perkin Ehner) in accordance with 
the instructions provided by the manufacturer. 

The skilled artisan will recognize that individual synthetic residues and 
polypeptides incorporating mimetics can be synthesized using a variety of procedures and 
methodologies, which are well described in the scientific and patent literature, e.g.. Organic 
Syntheses Collective Volumes, Oilman, et al. (Eds) John Wiley & Sons, Inc., NY. 
Polypeptides incorporating mimetics can also be made using solid phase synthetic 
procedures, as described, e.g., by Di Marchi, et al., U.S. Pat No. 5,422,426. Peptides and 
peptide mimetics of the invention can also be synthesized using combinatorial 
methodologies. Various techniques for generation of peptide and peptidomimetic libraries 
are weU known, and include, e.g., muWpin, tea bag, and split-couple-mix techniques; see, 
e.g., al-Obeidi (1998) Mol. Biotechnol. 9:205-223; Hruby (1997) Curr. Opin. Chem. Biol. 
1:1 14-1 19; Ostergaard (1997) Mol. Divers. 3:17-27; Ostresh (1996) Methods Enzymol. 
267:220-234. Modified peptides of the invention can be fiirther produced by chemical 
modification methods, see, e.g, Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel 
(1995) Free Radic. BioL Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896. 

Peptides and polypeptides of the invention can also be synthesized and 
expressed as fiision proteins with one or more additional domains linked thereto for, e.g., 
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producing a more iixununogenic peptide, to more readily isolate a recombinantly synthesized 
peptide, to identify and isolate antibodies and antibody-expressing B cells, and the like. 
Detection and purification facilitating domains include, e.g., metal chelating peptides such as 
polyhistidine tracts and histidine-tiyptophan modules that allow purification on immobilized 

5 metals, protein A domains that allow purification on immobilized immxmoglobulin, and the 
domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle 
WA). The inclusion of a cleavable linker sequences such as Factor Xa or enterokinase 
(Invitrogen, San Diego CA) between the purification domain and GCA-associated peptide or 
polypeptide can be usefiil to facilitate purification. For example, an expression vector can 

10 include an epitope-encoding nucleic acid sequence linked to six histidine residues followed 
by a thioredoxin and an enterokinase cleavage site (see e.g., Williams (1995) Biochemistry 
34:1787-1797; Dobeli (1998) Protein Expr. Purif 12:404-414). The histidine residues 
facilitate detection and purification while the enterokinase cleavage site provides a means for 
purifying the epitope from the ren^inder of the fiision protein. Technology pertaining to 

15 vectors encoding fiision proteins and application of fiision proteins are well described in the 
scientific and patent literature, see e,g., Kroll (1993) DNA CeU. BioL, 12:441-53. 

The invention provides antibodies that specifically bind to the polypeptides of 
the invention, as set forth in Table 4. These antibodies can be usefiil in the screening 
methods of the invention. The polypeptides or peptide can be conjugated to another 

20 molecule or can be administered with an adjuvant The coding sequence can be part of an 
expression cassette or vector capable of expressing the immunogen in vivo, (see, e.g., 
Katsumi (1994) Hum. Gene Then 5:1335-9). Methods of producing polyclonal and 
monoclonal antibodies are known to those of skill in the art and described in the scientific 
and patent literature, see, e.g., Coligan, CuRRE>4T PROTOCOLS IN Immunology, 

25 Wiley/Greene, NY (1 991); Stites (eds.) Basic and Clinical Immunology (7th ed.) Lange 
Medical Publications, Los Altos, CA; Coding, Monoclonal Antibodies: Principles and 
PRACnCE (2d ed.) Academic Press, New York, NY (1986); Harlow (1988) Antibodies, a 
Laboratory Manual, Cold Spring Harbor Publications, New York. 

Antibodies also can be generated in vitro, e.g., xising recombinant antibody 

30 binding site expressing phage display libraries, in addition to the tradftional in vivo methods 
using animals. See, e.g., Huse (1989) Science 246:1275; Ward (1989) Nature 341 :544; 
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Hoogenboom (1997) Trends Biotechnol. 15:62-70; Kalz (1997) Annu. Rev. Biophys. 
Biomol. Struct. 26:27-45. Human antibodies can be generated in mice engineered to produce 
only human antibodies, as described by, e.g., U.S. Patent No. 5,877,397; 5,874,299; 
5,789,650; and 5,939,598. B-cells from these mice can be immortalized using standard 
techniques (e.g., by fusing with an immortalizing cell line such as a myeloma or by 
manipulating such B-cells by other techniques to perpetuate a cell line) to produce a 
monoclonal human antibody-producing cell. See, e.g., U.S. Patent No. 5,916,771; 5,985,615. 

TABLE 3 

>Rv0002 dnaN DNA polymerase III, b-subunitTB.seq 2052:3257 MW:421 14 
>emb|AL123456|MTBH37RV:2052-3260. dnaN SEQ ID NO:1 

ATGGACGCGGCTACGACAAGAGTTGGCCTCACCGACTTGACGTTTCGTTTGCTACGAGAGTCTT 
TCGCCGATGCGGTGTCGTGGGTGGCTAAAAATCTGCCAGCCAGGCCCGCGGTGCCGGTGCTCT 
CCGGCGTGTTGTTGACCGGCTCGGACAACGGTCTGACQATTTCCGGATTCGACTACGAGGTTTC 
CGCCGAGGCCCAGGTTGGCGCTGAAATTGTTTCTCCTGGAAGCGTTTTAGTTTCTGGCCGATTG 
TTGTCCGATATTACCCGGGCGTTGCCTAACAAGCCCGTAGACGTTCATGTCGAAGGTAACCGGG 
TCGCATTGACCTGCGGTAACGCCAGGTTTTCGCTACCGACGATGCCAGTCGAGGATTATCCGAC 
GCTGCCGACGCTGCCGGAAGAGACCGGATTGTTGCCTGCGGAATTATTCGCCGAGGCAATCAG 
TCAGGTCGCTATCGCCGCCGGCCGGGACGACACGTTGCCTATGTTGACCGGCATCCGGGTCGA 
AATCCTCGGTGAGACGGTGGTTTTGGCCGCTACCGACAGGTTTCGCCTGGCTGTTCGAGAACTG 
AAGTGGTCGGCGTCGTCGCCAGATATCGAAGCGGCTGTGCTGGTCCCGGCCAAGACGCTGGC 
CGAGGCCGCCAAAGCGGGCATCGGCGGCTCTGACGTTCGTTTGTCGTTGGGTACTGGGCCGG 
GGGTGGGCAAGGATGGCCTGCTCGGTATCAGTGGGAACGGCAAGCGCAGCACCACGCGACTT 
CTTGATGCCGAGTTCCCGAAGTTTCGGCAGTTGCTACCAACCGAACACACCGCGGTGGCCACC 
ATGGACGTGGCCGAGTTGATCGAAGCGATCAAGCTGGTTGCGTTGGTAGCTGATCGGGGCGCG 
CAGGTGCGCATGGAGTTCGCTGATGGCAGCGTGCGGCTTTCTGCGGGTGCCGATGATGTTGGA 
CGAGCCGAGGAAGATCTTGTTGTTGACTATGCCGGTGAACCATTGACGATTGCGTTTAACCCAA 
CCTATCTAACGGACGGTTTGAGTTCGTTGCGCTCGGAGCGAGTGTCTTTCGGGTTTACGACTGC 
GGGTAAGCCTGCCTTGCTACGTCCGGTGTCCGGGGACGATCGCCCTGTGGCGGGTCTGAATGG 
CAACGGTCCGTTCCCGGCGGTGTCGACGGACTATGTCTATCTGTTGATGCCGGTTCGGTTGCCG 

30 GGCTGA 

>Rv0003 recF DNA replication and SOS Induction TB.seq 3280:4434 MW:421 81 
>emb|AL123456|MTBH37RV:3280-4437. recF SEQ ID NO:2 

GTGTACGTCCGTCATTTGGGGCTGCGTGACTTCCGGTCCTGGGCATGTGTAGATCTGGAATTGC 
35 ATCCAGGGCGGACGGTTTTTGTTGGGCCTAACGGTTATGGTAAGACGAATCTTATTGAGGCACT 
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GTGGTATTCGACGACGTTAGGTTCGCACCGCGTTAGCGCCGATTTGCCGTTGATCCGGGTAGGT 

ACCGATCGTGCGGTGATCTCCACGATCGTGGTGAACGACGGTAGAGAATGTGCCGTCGACCTC 

GAGATCGCCACGGGGCGAGTCAAGAAAGCGCGATTGAATCGATCATCGGTCCGAAGTACACGT 

GATGTGGTCGGAGTGCTTCGAGCTGTGTTGTTTGCCCCTGAGGATCTGGGGTTGGTTCGTGGG 

GATCCCGCTGACCGGCGGCGCTATCTGGATGATCTGGCGATCGTGCGTAGGCCTGCGATCGCT 

GCGGTACGAGCCGAATATGAGAGGGTG-rTGCGCCAGCGGACGGCGTTATTGAAGTCCGTACCT 

GGAGCACGGTATCGGGGTGACCGGGGTGTGTTTGACACTCTTGAGGTATGGGACAGTCGTTTG 

GCGGAGCACGGGGCTGAACTGGTGGCCGCCCGCATCGATTTGGTCAACCAGTTGGCACCGGA 

AGTGAAGAAGGCATACCAGCTGTTGGCGCCGGAATCGCGATCGGCGTCTATCGGTTATCGGGC 

CAGCATGGATGTAAGCGGTCCCAGCGAGCAGTCAGATATCGATCGGCAATTGTTAGCAGCTCGG 

CTGTTGGCGGCGCTGGCGGCCCGTCGGGATGCCGAACTCGAGCGTGGGGTTTGTCTAGTTGGT 

CCGCACCGTGACGACCTAATACTGCGACTAGGCGATCAACCCGCGAAAGGATTTGCTAGCCATG 

GGGAGGCGTGGTCGTTGGCGGTGGCACTGCGGTTGGCGGCCTATCAACTGTTACGCGTTGATG 

GTGGTGAGCCGGTGTTGTTGCTCGACGACGTGTTCGCCGAACTGGATGTCATGCGCCGTCGAG 

CGTTGGCGACGGCGGCCGAGTCCGCCGAACAGGTGTTGGTGACTGCCGCGGTGCTCGAGGAT 

ATTCCCGCCGGCTGGGACGCCAGGCGGGTGCACATCGATGTGCGTGCCGATGACACCGGATC 

GATGTCGGTGGTTCTGCCATGA 

>Rv0005 gyrB DN A gyrase subunit B TB.seq .51 23:7264 MW:78441 
>emb|AL123456lMTBH37RV:51 23-7267. gyrB SEQ ID NO:3 

ATGGGTAAAAACGAGGCCAGAAGATCGGCCCTGGCGCCCGATCACGGTACAGTGGTGTGCGAC 

CCCCTGCGGCGACTCAACCGCATGCACGCAACCCCTGAGGAGAGTATTCGGATCGTGGCTGCC 

CAGAAAAAGAAGGCCCAAGACGAATACGGCGCTGCGTCTATCACCATTCTCGAAGGGCTGGAG 

GCCGTCCGCAAACGTCCCGGCATGTACATTGGCTCGACCGGTGAGCGCGGTTTACACCATCTC 

ATTTGGGAGGTGGTCGACAACGCGGTCGACGAGGCGATGGCCGGTTATGCAACCACAGTGAAC 

GTAGTGCTGCTTGAGGATGGCGGTGTCGAGGTCGCCGACGACGGCCGCGGCATTCCGGTCGG 

CACCCACGCCTCCGGCATACCGACCGTCGACGTGGTGATGACACAACTACATGCCGGCGGCAA 

GTTCGACTCGGACGCGTATGCGATATCTGGTGGTCTGCACGGCGTCGGCGTGTCGGTGGTTAA 

CGCGCTATCCACCCGGCTCGAAGTCGAGATCAAGCGCGACGGGTACGAGTGGTCTCAGGTTTA 

TGAGAAGTCGGAACCCCTGGGCCTCAAGCAAGGGGCGCCGACCAAGAAGACGGGGTCAACGG 

TGCGGTTCTGGGCCGACCCCGCTGTTTTGGAAACCACGGAATACGACTTCGAAACCGTCGCCC 

GCCGGCTGCAAGAGATGGCGTTCCTCAACAAGGGGCTGACCATCAACCTGACCGACGAGAGGG 

TGACCCAAGACGAGGTCGTCGACGAAGTGGTCAGCGACGTCGCCGAGGCGCCGAAGTCGGCA 

AGTGAACGCGCAGCCGAATCCACTGCACCGCACAAAGTTAAGAGCCGCACCTTTCACTATCCGG 

GTGGCCTGGTGGACTTCGTGAAACACATCAACCGCACCAAGAACGCGATTCATAGCAGCATCGT 

GGACTTTTCCGGCAAGGGCACCGGGCACGAGGTGGAGATCGCGATGCAATGGAACGCCGGGT 

ATTCGGAGTCGGTGCACACCTTCGCCAACACCATCAACACCCACGAGGGCGGCACCCACGAAG 
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AGGGCTTCCGCAGCGCGCTGACGTCGGTGGTGAACAAGTACGCCAAGGACCGCAAGCTACTGA 

AGGACAAGGACCCCAACCTCACCGGTGACGATATCCGGGAAGGCCTGGCCGCTGTGATCTCGG 

TGAAGGTCAGCGAACCGCAGTTCGAGGGCCAGACCAAGACCAAGTTGGGCAACACCGAGGTCA 

AATCGTTTGTGCAGAAGGTCTGTAACGAACAGCTGACCCACTGGTTTGAAGCCAACCCCACCGA 

CGCGAAAGTCGTTGTGAACAAGGCTGTGTCCTCGGCGCAAGCCCGTATCGCGGCACGTAAGGC 

ACGAGAGTTGGTGCGGCGTAAGAGCGCCACCGACATCGGTGGATTGCCCGGCAAGCTGGCCG 

ATTGCCGTTCCACGGATCCGCGCAAGTCCGAACTGTATGTCGTAGAAGGTGACTCGGCCGGCG 

GTTCTGCAAAAAGCGGTCGCGATTCGATGTTCCAGGCGATACTTCCGCTGCGCGGCAAGATCAT 

CAATGTGGAGAAAGCGCGCATCGACCGGGTGCTAAAGAACACCGAAGTTCAGGCGATCATCAC 

GGCGCTGGGCACCGGGATCCACGACGAG7TCGATATCGGCAAGCTGCGCTACCACAAGATCGT 

GCTGATGGCCGACGCCGATGTTGACGGCCAACATATTTCCACGCTGTTGTTGACGTTGTTGTTC 

CGGTTCATGCGGCCGCTCATCGAGAACGGGCATGTGTTTTTGGCACAACCGCCGCTGTACAAAC 

TCAAGTGGCAGCGCAGTGACCCGGAATTCGCATACTCCGACCGCGAGCGCGACGGTCTGCTGG 

AGGCGGGGCTGAAGGCCGGGAAGAAGATCAACAAGGAAGACGGCATTCAGCGGTACAAGGGT 

CTAGGTGAAATGGACGCTAAGGAGTTGTGGGAGACCACCATGGATCCCTCGGTTCGTGTGTTGC 

GTCAAGTGACGCTGGACGACGCCGCCGCCGCCGACGAGTTGTTCTCCATCCTGATGGGCGAGG 

ACGTCGACGCGCGGCGCAGCTTTATCACCCGCAACGCCAAGGATGTTCGGTTCCTGGATGTCTA 

A 

>Rv0006 gyrA DNA gyrase subunit A TB.seq 7302:9815 MW:92276 
>emb|AL123456|MTBH37RV:7302-9818. gyrA SEQ ID NO:4 

ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCGAG 

CAGGAGATGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCG 

GAGGTGCGCGACGGGCTCAAGCCCGTGCATCGCCGGGTGCTCTATGCAATGTTCGATTCCGGC 

TTCCGCCCGGACCGCAGCCACGCCAAGTCGGCCCGGTCGGTTGCCGAGACCATGGGCAACTA 

CCACCCGCACGGCGAGGCGTCGATCTACGACAGCCTGGTGCGCATGGCCCAGCCCTGGTCGC 

TGCGCTACCCGCTGGTGGACGGCCAGGGCAACTTCGGCTCGCCAGGCAATGACCCACCGGCG 

GCGATGAGGTACAGCGAAGCCCGGCTGACCCCGTTGGCGATGGAGATGCTGAGGGAAATCGAC 

GAGGAGACAGTCGATTTCATCCCTAACTACGACGGCCGGGTGCAAGAGCCGACGGTGCTACCC 

AGCCGGTTCCCCAACCTGCTGGCCAACGGGTCAGGCGGCATCGCGGTCGGCATGGCAACCAAT 

ATCCCGCCGCACAACCTGCGTGAGCTGGCCGACGCGGTGTTCTGGGCGCTGGAGAATCACGAC 

GCCGACGAAGAGGAGACCCTGGCCGCGGTCATGGGGCGGGTTAAAGGCCCGGACTTCCCGAC 

CGCCGGACTGATCGTCGGATCCCAGGGCACCGCTGATGCCTACAAAACTGGCCGCGGCTCCAT 

TCGAATGCGCGGAGTTGTTGAGGTAGAAGAGGATTCCCGCGGTCGTACCTCGCTGGTGATCAC 

CGAGTTGCCGTATCAGGTCAACCACGACAAGTTCATCACTTCGATCGCCGAACAGGTCCGAGAC 

GGCAAGCTGGCCGGCATTTCCAACATTGAGGACCAGTCTAGCGATCGGGTCGGTTTACGCATC 

GTCATCGAGATCAAGCGCGATGCGGTGGCCAAGGTGGTGATCAATAACCTTTACAAGCACACCC 
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AGCTGCAGACCAGCTTTGGCGCCAACATGCTAGCGATCGTCGACGGGGTGCCGCGCACGCTGC 
GGCTGGACCAGCTGATCCGCTATTACGTTGACCACCAACTCGACGTCATTGTGCGGCGCACCAG 
CTACCGGCTGCGCAAGGCAAACGAGCGAGCCCACATTCTGCGCGGCCTGGTTAAAGCGCTCGA 
CGCGCTGGACGAGGTCATTGCACTGATCCGGGCGTCGGAGACCGTCGATATCGCCCGGGCCG 

5 GACTGATCGAGCTGCTCGACATCGACGAGATCCAGGCCCAGGCAATCCTGGACATGCAGTTGC 
GGCGCCTGGCCGCACTGGAACGCCAGCGCATCATCGACGACCTGGCCAAAATCGAGGCCGAG 
ATCGCCGATCTGGAAGACATCCTGGCAAAACCCGAGCGGCAGCGTGGGATCGTGCGCGACGAA 
CTCGCCGAAATCGTGGACAGGCACGGCGACGACCGGCGTACCCGGATCATCGCGGCCGACGG 
AGACGTCAGCGACGAGGATTTGATCGCCCGCGAGGACGTCGTTGTCACTATCACCGAAACGGG 

10 ATACGCCAAGCGCACCAAGACCGATCTGTATCGCAGCCAGAAACGCGGCGGCAAGGGCGTGCA 
GGGTGCGGGGTTGAAGCAGGACGACATCGTCGCGCACTTCTTCGTGTGCTCCACCCACGATTT 
GATCCTGTTCTTCACCACCCAGGGACGGGTTTATCGGGCCAAGGCCTACGACTTGCCCGAGGC 
CTCCCGGACGGCGCGCGGGGAGCACGTGGCCAACCTGTTAGCCTTCCAGCCCGAGGAACGCA 
TCGCCCAGGTCATCCAGATTCGCGGCTACACCGACGCCCCGTACCTGGTGCTGGCCACTCGCA 

1 5 ACGGGCTGGTGAAAAAGTCCAAGCTGACCGACTTCGACTCCAATCGCTCGGGCGGAATCGTGG 
CGGTCAACCTGCGCGACAACGACGAGGTGGTCGGTGCGGTGCTGTGTTCGGCCGGCGACGAC 
CTGCTGCTGGTCTCGGCCAACGGGCAGTCCATCAGGTTCTCGGCGACCGACGAGGCGCTGCG 
GCCAATGGGTCGTGCCACCTCGGGTGTGCAGGGCATGCGGTTCAATATCGACGACCGGCTGCT 
GTCGGTGAACGTCGTGCGTGAAGGCACCTATCTGCTGGTGGCGACGTCAGGGGGCTATGCGAA 

20 ACGTACCGCGATCGAGGAATACCCGGTACAGGGCCGCGGCGGTAAAGGTGTGCTGACGGTCAT 
GTACGACCGCCGGCGCGGCAGGTTGGTTGGGGCGTTGATTGTCGACGACGACAGCGAGCTGT 
ATGCCGTCACTTCCGGCGGTGGCGTGATCCGCACCGCGGCACGCCAGGTTCGCAAGGCGGGA 
CGGCAGACCAAGGGTGTTCGGTTGATGAATCTGGGCGAGGGCGACACACTGTrGGCCATCGCG 
CGCAACGCCGAAGAAAGTGGCGACGATAATGCCGTGGACGCCAACGGCGCAGACCAGACGGG 

25 CAATTAA 

>Rv001 4c pknB serine-threonine protein kinase TB.seq 1 5593:1 7470 MW:6651 1 
>emb|AL1 23456|MTBH37RV:c1 7470-1 5590. pknB SEQ ID NO:5 

ATGACCACCCCTTCCCACCTGTCCGACCGCTACGAACTTGGCGAAATCCTTGGATTTGGGGGCA 
30 TGTCCGAGGTCCACCTGGCCCGCGACCTCCGGTTGCACCGCGACGTTGCGGTCAAGGTGCTGC 
GCGCTGATCTAGCCCGCGATCCCAGTTTTTACCTTCGCTTCCGGCGTGAGGCGCAAAACGCCG 
CGGCATTGAACCACCCTGCAATCGTCGCGGTCTACGACACCGGTGAAGCCGAAACGCCCGCCG 
GGCCATTGCCCTACATCGTCATGGAATACGTCGACGGCGTTACCCTGCGCGACATTGTCCACAC 
CGAAGGGCGGATGACGCCCAAAGGCGCCATCGAGGTCATCGCCGACGCCTGCCAAGCGGTGA 
35 ACTTCAGTCATCAGAACGGAATCATCCACCGTGACGTCAAGCCGGCGAACATCATGATCAGCGC 
GACCAATGCAGTAAAGGTGATGGATTTCGGCATCGCCCGCGCCATTGCCGACAGCGGCAACAG 
CGTGACCCAGACCGCAGCAGTGATCGGCACGGCGCAGTACCTGTCACCCGAACAGGCCCGGG 
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GTGATTCCGTCGACGCCCGATCCGATGTCTATTCCTTGGGCTGTGTTCTTTATGAAGTCCTCACC 

GGGGAGCCACCTTTCACCGGCGACTCACCCGTCTCGGTTGCGTACCAACATGTGCGCGAAGAC 

CCGATCCCACCTTCGGCGCGGCACGAAGGCCTCTCCGCCGACCTGGACGCCGTCGTTCTCAAG 

GCGCTGGCXAAAAATCCGGAAAACCGCTATCAGACAGCGGCGGAGATGCGCGCCGACCTGGTC 

CGCGTGCACAACGGTGAGCCGCCCGAGGCGCCCAAAGTGCTCACCGATGCCGAGCGGACCTC 

GCTGCTGTCGTCTGCGGCCGGCAACCTTAGCGGTCCGCGCACCGATCCGCTACCACGCCAGGA 

CTTAGACGACACCGACCGTGACCGCAGCATCGGTTCGGTGGGCCGTTGGGTTGCGGTGGTCGC 

CGTGCTCGCTGTGCTGACCGTCGTGGTAACCATCGCCATCAACACGTTCGGCGGCATCACCCG 

CGACGTTCAAGTTCCCGACGTTCGGGGTCAATCCTCCGCCGACGCCATCGCCACACTGCAAAA 

CCGGGGCTTCAAAATCCGCACCTTGCAGAAGCCGGACTCGACAATCCCACCGGACCACGTTAT 

CGGCACCGACCCGGCCGCCAACACGTCGGTGAGTGCAGGCGACGAGATCACAGTCAACGTGT 

CCACCGGACCCGAGCAACGCGAAATACCCGACGTCTCCACGCTGACATACGCCGAAGCGGTCA 

AGAAACTGACTGCCGCCGGATTCGGCCGCTTCAAGCAAGCGAATTCGCCGTCCACCCCGGAAC 

TGGTGGGCAAGGTCATCGGGACCAACCCGCCAGCCAACCAGACGTCGGCCATCACCAATGTGG 

TCATCATCATCGTTGGCTCTGGTCCGGCGACCAAAGACATTCCCGATGTCGCGGGCCAGACCGT 

CGACGTGGCGCAGAAGAACCTCAACGTCTACGGCTTCACCAAATTCAGTCAGGCCTCGGTGGA 

CAGCCCCCGTCCCGCCGGCGAGGTGACCGGCACCAATCCACCCGCAGGCACCACAGTTCCGG 

TCGATTCAGTCATCGAACTACAGGTGTCCAAGGGCAACCAATTCGTCATGCCXJGACCTATCCGG 

CATGTTCTGGGTCGACGCCGAACCACGATTGCGCGCGCTGGGCTGGACCGGGATGCTCGACAA 

AGGGGCCGACGTCGACGCCGGTGGCTCCCAACACAACCGGGTCGTCTATCAAAACCCGCCGG 

CGGGGACCGGCGTCAACCGGGACGGCATCATCACGCTGAGGTTCGGCCAGTAG 

>Rv0016c pbpATB.seq 18762:20234 MW:51577 
>emb|AL123456|MTBH37RV:c20234-18759. pbpA SEQ JD NO:6 

ATGAACGCCTCTCTGCGCCGAATATCGGTGACCGTGATGGCGTTGATCGTGTTGCTACTGCTCA 

ACGCGACCATGACGCAGGTCTTCACCGCCGACGGGCTGCGTGCCGATCCCCGCAACCAGCGA 

GTGTTGCTCGACGAGTATTCACGGCAGCGCGGCCAGATCACCGCTGGTGGCCAACTGCTGGCG 

TACTCGGTAGCCACCGACGGCCGCTTTCGTTTCCTGCGGGTCTATCCCAATCCTGAGGTGTACG 

CGCCGGTTACCGGCTTCTACTCCCTGCGCTATTCCAGCACCGCCCTAGAACGAGCCGAGGACC 

CGATATTGAACGGGTCCGACCGCCGTCTGTTCGGCCGCCGGCTGGCCGACTTCTTCACCGGTC 

GCGACCCACGCGGCGGTAATGTCGATACCACGATCAACCCGCGCATTCAGCAAGCCGGCTGGG 

ACGCGATGCAGCAAGGCTGCTACGGGCCCTGTAAGGGAGCGGTGGTCGCCCTTGAGCCATCAA 

CCGGCAAGATTTTGGCGTTGGTGTCTTCTCCGTCCTACGACCCXAACCTGCTGGCGTCGCATAA 

CCCCGAGGTGCAGGCGCAAGCCTGGCAGCGGCTTGGCGACAATCCCGCCTCTCCACTGACCAA 

CCGTGCCATCTCTGAGACGTATCCACCGGGTTCGACTTTCAAAGTGATCACCACTGCGGCCGCG 

CTGGCCGCCGGGGCCACCGAGACCGAACAGCTGACTGCGGCGCCCACAATTCCGTTGCCAGG 

CAGCACCGCCCAGCTAGAGAACTACGGCGGTGCGCCGTGCGGGGACGAACCCACCGTGTCGC 
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TGCGTGAGGCATTCGTCAAATCATGCAACACCGCATTCGTCCAGCTGGGCATCCGCACCGGCG 

CCGACGCCCTGCGCAGCATGGCGCGCGCGTTCGGTCTCGATAGCCCACCGCGCCCAACTCCG 

CTGCAAGTGGCGGAATCAACCGTCGGGCCTATCCCGGACAGCGCCGCACTAGGGATGACCAGT 

ATCGGCCAAAAGGACGTTGCGCTGACCCCGCTAGCGAACGCAGAAATAGCCGCGACCATCGCA 

AACGGCGGCATTACGATGAGGCCTTATCTAGTCGGCAGCCTCAAGGGACCGGACCTAGCCAAT 

ATCTCAACCACCGTCGGATACCAGCAGCGCCGCGCGGTGTCACCGCAGGTCGCCGCTAAGCTA 

ACAGAGCTGATGGTCGGCGCCGAGAAAGTCGCACAGCAGAAAGGGGCAATCCCCGGCGTGCA 

GATCGCATCCAAGACGGGCACCGCCGAACATGGCACCGACCCTCGTCACACTCCACCGCACGC 

TTGGTACATCGCCTTTGCGCCCGCACAAGCGCCCAAGGTGGCTGTTGCCGTGCTGGTGGAGAA 

CGGGGCTGATCGGCTGTCCGCCACCGGAGGTGCCCTCGCGGCACCGATCGGGCGGGCGGTG 

ATCGAAGCCGCACTGCAGGGGGAACCATGA 

>Rv0017c rodATB.seq 20234:21640 MW:50612 
>emb|AL123456|MTBH37RV:c21 640-20231. rodA SEQ ID NO:7 

ATGACGACACGACTGCAAGCGCCGGTGGCCGTAACGCCCCCGTTGCCGACTCGGCGCAACGC 

TGAACTGCTGCTGCTGTGCTTTGCCGCCGTAATCACGTTTGCCGCACTGCTGGTCGTGCAGGCC 

AATCAAGACCAGGGGGTGCCCTGGGACTTGACTAGCTACGGACTGGCCTTCCTGACCCTGTTC 

GGATCCGCGCATCTGGCCATCCGGCGCTTCGCCCCCTACACTGACCCGCTGTTGCTCCCGGTG 

GTGGCACTGCTCAACGGACTTGGCCTGGTAATGATCCACCGCCTCGATCTGGTGGACAACGAG 

ATCGGCGAGCATCGGCACCCCAGCGCAAACCAGCAGATGCTGTGGACGCTGGTGGGCGTAGC 

TGCCTTCGCGCTCGTGGTGACCTTCCTCAAGGACCACCGACAGCTCGCACGCTACGGCTACATT 

TGCGGGCTCGCGGGTCTGGTTTTCTTGGCAGTTCCCGCGCTGCTCCCGGCAGCACTGTCCGAA 

CAGAACGGCGCCAAGATCTGGATCCGGTTGCCCGGCTTCTCGATTCAACCCGCCGAATTTTCAA 

AGATTCTGCTGCTGATCTTCTTTTCGGCGGTACTGGTGGCCAAACGCGGCCTGTTCACCAGCGC 

CGGCAAACATTTGCTCGGAATGACCCTGCCGCGCCCGCGAGACCTCGCGCCACTGTTGGCAGC 

CTGGGTCATCTCGGTGGGTGTGATGGTCTTCGAGAAAGACCTCGGCGCTTCGCTGGTGCTGTAC 

ACATCGTTTCTGGTGGTGGTTTACCTCGCCACCCAGCGGTTCAGTTGGGTCGTCATCGGCCTGA 

CTCTGTTCGCGGCAGGAACCTTGGTGGCGTACTTCATTTTTGAGCACGTCCGGCTCCGCGTACA 

GACCTGGCTGGATCCGTTCGCAGATCCAGACGGCACCGGATATCAGATCGTGCAGTCGCTTTTC 

AGCTTCGCTACAGGCGGTATCTTCGGCACCGGGCTCGGTAATGGTCAACCCGACACCGTGCCC 

GCGGCATCCACCGATTTCATCATCGCCGCGTTCGGCGAAGAGCTTGGGTTGGTGGGCTTGACG 

GCCATCCTGATGCTCTACACCATCGTGATCATCCGGGGTTTGCGCACGGCCATCGCCACCCGC 

GATAGCTTCGGCAAGCTGCTGGCCGCCGGCCTCTCATCGACGCTAGCCATTCAGCTGTTCATCG 

TCGTCGGCGGTGTGACCCGACTCATTCCGCTGACCGGGTTGACCACACCGTGGATGTCCTACG 

GCGGGTCTTCACTGCTGGCCAACTACATATTGCTGGCCATCCTGGCACGCATCTCGCACGGAGC 

CCGCCGCCCACTGCGCACCCGCCCACGAAATAAGTCGCCGATTACGGCGGCCGGCACCGAGG 

TCATCGAACGCGTATGA 
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>Rv0018c ppp TB.seq 21640:23181 MW:53781 
>emb|AL123456|MTBH37RV:c23181-21637. ppp SEQ ID NO:8 

GTGGCGCGCGTGACCCTGGTCCTGCGATACGCGGCGCGCAGCGATCGCGGCTTGGTACGCGC 

5 CAACAACGAAGACTCGGTCTACGCTGGGGCACGGCTATTGGCCCTGGCCGACGGCATGGGTG 
GGCATGCGGCCGGCGAGGTGGCGTCCCAGTTGGTGATTGCCGCATTGGCCCATCTCGATGACG 
ACGAGCCCGGTGGCGATCTGCTGGCCAAGCTGGATGCCGCGGTGCGCGCCGGCAACTGGGCT 
ATCGCAGCGCAAGTCGAGATGGAGCCCGATCTCGAAGGCATGGGTACCACGCTCACCGCAATC 
CTGTTCGCGGGCAACCGGCTCGGCCTGGTGCATATCGGTGACTCGCGCGGTTACCTGCTGCGC 

1 0 GACGGTGAGCTGACGCAGATCACCAAGGACGACACGTTTGTCCAAACGCTGGTCGACGAAGGC 
CGGATCACCCCGGAGGAGGCGCACAGCCACCCGCAACGCTCGTTGATCATGCGGGCGTTGAC 
CGGCCATGAGGTCGAACCGACGCTGACCATGCGAGAAGCCCGCGCCGGTGATCGTTACCTGCT 
GTGCTCGGACGGGTTGTCCGATCCGGTTAGCGATGAAACTATCCTCGAGGCCCTGCAGATCCC 
CGAGGTTGCCGAGAGCGCTCACCGCCTCATTGAACTGGCGCTGCGCGGCGGCGGCCCCGACA 

15 ACGTCACTGTCGTCGTCGCCGACGTCGTCGACTACGACTACGGCCAGACCCAACCGATTCTGG 
CCGGGGCGGTCTCAGGCGACGACGACCAACTGACCCTGCCCAACACCGCCGCCGGCCGGGCC 
TCTGCCATCAGCCAGCGCAAGGAGATCGTTAAACGCGTTCCGCCACAGGCCGATACATTCAGTC 
GGCCACGGTGGTCGGGCCGACGGCTAGCATTCGTTGTCGCACTGGTGACCGTGCTGATGACTG 
CGGGCCTGCTCATTGGTCGCGCGATCATCCGCAGCAACTACTACGTAGCGGACTACGCCGGCA 

20 GCGTGTCCATCATGCGGGGGATTCAAGGGTCGCTACTGGGCATGTCCCTGCACCAGCCTTACC 
TGATGGGCTGCCTCAGCCCGCGTAACGAGCTGTCGCAGATCAGCTACGGACAGTCTGGGGGCC 
CTCTCGACTGCCATCTGATGAAACTGGAGGATCTGCGACCGCCGGAGCGCGCACAGGTTCGGG 
CCGGTCTCCCGGCCGGCACTCTCGATGACGCCATCGGGCAGTTGCGCGAACTGGCGGCCAACT 
CCCTGCTGCCGCCTTGCCCGGCGCCGCGTGCCACGTCCCCGCCCGGGCGCCCGGCCCCACCC 

25 ACCACCAGCGAGACAACCGAACCAAACGTCACCTCCTCGCCAGCCTCTCCATCACCCACCACCT 
CCGCGCCGGGCCXJCACCGGAACTACTCCTGCCATCCCCACGAGTGCCTCCCCGGCAGCGCCC 
GCGTCGCCGCCGACGCCTTGGCCCGTCACCAGCTCGCCGACGATGGCCGCACTTCCGCCACC 
CCCGCCTCAGCCGGGCATCGACTGCCGGGCGGCGGCATGA 

30 >Rv0019c - TB.seq 23273:23737 MW:17153 

>emb|AL123456|MTBH37RV:c23737-23270. Rv0019c SEQ ID NO:9 

ATGCAGGGGTTGGTACTGCAACTGACGCGTGCCGGATTCTTGATGTTGTTGTGGGTATTCATCT 
GGTCCGTGGTACGGATCTTGAAGACCGACATTTATGCGCCGACCGGCGCGGTCATGATGCGCC 
GGGGCCTGGCGCTGCGAGGGACGCTCTTAGGCGCGCGTCAGCGCCGGCACGCTGCACGCTAC 
35 CTGGTGGTGACCGAAGGTGCGTTGACTGGCGCGCGTATCACGCTGAGCGAACAGCCGGTGTTG 
ATCGGGCGCGGCGACGACTCGACCCTGGTGCTGACCGACGACTACGCCTCGACGCGGCACGC 
TCGGCTGTCTATGCGCGGCTCCGAGTGGTACGTCGAAGATCTAGGATCGACCAACGGCACTTA 
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CCTGGACAGGGCGAAGGTGACGACTGCGGTACGAGTTCCGATCGGAACGCCGGTTCGCATCG 
GCAAAACTGCAATCGAGTTGCGCCCGTGA 

>Rv0020c - TB.seq 23864:25444 MW:56881 

5 >emb|AL123456|MTBH37RV:c25444-23861. Rv0020c SEQ ID NO:10 

ATGGGTAGCCAGAAAAGGCTGGTTCAGCGCGTTGAGCGCAAACTCGAGCAGACGGTTGGCGAT 
GCGTTTGCCCGCATCTTTGGAGGCTCGATCGTCCCGCAAGAGGTCGAAGCCCTGCTGCGCCGC 
GAGGCGGCCGACGGCATCCAGTCGCTGCAGGGAAATCGCCTTTTGGCGCCCAACGAATACATC 
ATTACCCTCGGTGTGCACGACTTTGAGAAGTTGGGCGCTGATCCTGAGCTGAAGTCAACCGGTT 

0 TTGCTCGGGACTTGGCGGACTATATCCAAGAACAGGGGTGGCAAACGTATGGTGATGTGGTCGT 
CCGATTCGAGCAGTCGTCGAACCTGCATACCGGCCAGTTCCGCGCCCGCGGCACTGTTAACCC 
CGACGTTGAGACCCACCCGCCGGTCATCGATTGCGCCCGGCCACAATeAAACCACGCGTTTGG 
CGCAGAACCAGGAGTAGCACCAATGAGTGACAATTCGAGCTACCGTGGCGGTCAGGGGCAGGG 
GCGTCCCGACGAGTATTACGACGACCGCTATGCGCGTCCGCAAGAGGATCCGCGTGGTGGCCC 

15 GGATCCGCAAGGCGGATCTGACCCCCGCGGGGGGTATCCACCCGAGACGGGCGGCTACCCGC 
CCCAGCCGGGCTACCCACGCCCGCGCCACCCGGACCAGGGCGACTACCCCGAGCAAATCGGG 
TACCeCGACCAGGGCGGTTACCCCGAGCAACGCGGTTACCCCGAGCAACGCGGCTACCCCGA 
CCAGCGCGGGTACCAGGACCAGGGTCGAGGCTACCCCGACCAAGGGCAGGGGGGCTATCCGC 
CGCCCTACGAGCAACGCCCTCCTGTTTCTCCCGGCCCGGCTGCCGGCTACGGCGCTCCCGGCT 

>0 ACGACCAGGGCTATCGCCAAAGCGGCGGCTACGGCCCTTCACCCGGTGGCGGCCAGCCCGGC 
TACGGCGGGTACGGGGAGTACGGGCGTGGCCCGGCTCGCCACGAGGAGGGCAGCTATGTGCC 
CTCTGGCCCTCCGGGCCCGCCCGAGCAACGACCGGCTTACCCCGACCAAGGCGGTTACGACC 
AGGGCTACCAGCAAGGCGCCACGACATACGGCCGGCAAGACTATGGCGGCGGCGCTGACTAG 
ACCCGCTACACCGAATCCCCGCGGGTCCCGGGATACGCTCCTCAGGGTGGCGGGTACGCCGA 

15 ACCCGCCGGCCGAGACTACGACTACX3GCCAATCAGGCGCTCCGGACTACGGTCAGCCAGCGC 
CCGGTGGCTACAGCGGTTACGGGCAGGGCGGCTATGGGTCCGCCGGAACGTCGGTTACGCTG 
CAGCTCGACGACGGCAGCGGACGCACTTACCAGCTCCGCGAGGGCTCCAACATCATCGGTCGC 
GGACAGGACGCCCAGTTCCGGCTGCCCGACACCGGTGTGTCACGCCGTCACTTGGAGATCCG 
GTGGGACGGGCAGGTCGCATTGCTCGCAGACCTGAACTCCACCAACGGCACCACTGTTAACAA 

JO TGCACCGGTACAGGAGTGGCAGTTGGCCGACGGTGATGTGATCCGCTTGGGACACTCCGAGAT 
CATCGTCCGCATGCACTGA 

>Rv0032 bioF2 C-terminal similar to B. subtilis BioF TB.seq 34295:36607 MW:86245 
>emb|AL123456|MTBH37RV:34295-36610. bioF2 SEQ ID NO:11 
35 ATGCCCACTGGCTTGGGCTATGACTTTCTGCGCCCTGTCGAGGACTCGGGGATCAACGACCTGA 
AGCACTATTACTTCATGGCGGATTTGGCCGATGGGCAACCGCTAGGCCGGGCAAACGTCTATAG 
CGTCTGTTTCGACCTGGCCACCACCGACCGCAAGCTCACTCCGGCCTGGCGAACGACCATCAA 
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ACGGTGGTTTCCGGGGTTTATGACCTTCCGTTTCCTCGAGTGCGGGTTGCTCACCATGGTGAGC 
AACCCGCTGGCGTTGCGGTCCGACACCGACTTGGAGCGGGTATTGCCTGTGCTGGCCGGCCAG 
ATGGACCAGTTGGCGCATGACGACGGGTCGGATTTCTTGATGATCCGGGACGTGGACCCGGAA 
CACTACCAGCGATACCTTGACATCGTGCGCCCGTTGGGCTTTCGGCCTGCGCTGGGCTTTTCCC 
5 GGGTAGACACGACCATCAGCTGGTCGAGCGTGGAAGAGGCACTGGGCTGCCTGTCTCACAAAA 
GGCGCCTGCCGTTGAAGACGTCGCTGGAGTTTCGTGAGCGGTTCGGTATCGAGGTCGAGGAAC 
TCGACGAGTATGCCGAGCATGCGCCGGTATTGGCCCGGCTTTGGCGCAACGTCAAGACGGAGG 
CAAAGGATTACCAGCGCGAGGACCTGAACCCTGAGTTCTTCGCGGCGTGTTCTCGGCATCTGCA 
TGGACGTAGCAGACTGTGGTTGTTCCGCTACCAGGGCACGCCAATTGCCTTCI I I I IGAACGTTT 
10 GGGGTGCGGATGAGAACTACATACTGCTTGAGTGGGGCATCGATCGTGATTTTGAACATTATAG 
GAAGGCGAATCTGTACCGGGCGGCGCTGATGCTCAGCCTAAAAGATGCGATCAGCCGAGATAA 
ACGGCGAATGGAAATGGGTATTACGAACTATTTCACAAAACTTCGCATTCCGGGTGCCCGAGTC 
ATACCGACCATCTATTTCCTGCGTCACAGCACGGATCCGGTGCATACGGCAACGTTAGCGCGAA 
TGATGATGCACAATATTCAACGGCCAACGCTACCCGACGATATGTCGGAGGAATTGTGTCGCTG 
15 GGAAGAGCGAATACGTCTGGACCAGGACGGGCTACCCGAACACGATATCTTTCGCAAGATCGAT 
CGTCAGCACAAATACACGGGGCTCAAACTCGGCGGAGTCTACGGTTTTTATCCCCGATTCACCG 
GACCGCAGCGATCCACGGTCAAGGCCGCGGAGCTGGGCGAGATCGTGTTGCTGGGCACGAAC 
TCGTATCTGGGCCTGGCCACCCATCCAGAGGTGGTGGAGGCCTCGGCGGAGGCCACGCGACG 
GTACGGCACCGGCTGCTCGGGTTCGCCGTTGCTGAACGGCACGTTGGACTTGCACGTCTCGCT 
20 TGAGCAGGAACTAGCCTGTTTTTTGGGCAAACCCGCCGCCGTGTTGTGCTCCACCGGATATCAG 
AGCAACCTGGCGGCGATCAGCGCGCTATGCGAATCCGGGGACATGATCATCCAAGACGCGCTG 
AACCACCGCAGCCTGTTCGACGCCGCCAGGTTGTCCGGGGCCGACTTCACCTTGTACCGGCAC 
AACGACATGGACCACCTGGCGCGGGTGCTACGCCGCACCGAGGGGCGCCGCCGGATCATCGT 
CGTGGACGCGGTGTTCAGCATGGAAGGCACCGTCGCCGACCTGGCCACCATCGCCGAGCTTG 
25 CCGACCGGCACGGCTGCCGGGTCTATGTGGACGAGTCCCATGCGCTGGGCGTGCTCGGCCCC 
GACGGGCGAGGAGCTTCGGCCGCGTTGGGTGTCTTGGCGCGCATGGACGTGGTGATGGGCAC 
GTTCAGCAAATCCTTTGCCTCCGTCGGCGGGTTCATCGCCGGAGATCGGCCCGTCGTGGACTA 
CATCCGGCACAACGGTTCAGGTCATGTGTTTTCCGCCAGCCTGCCGCCGGCCGCCGCGGCTGC 
CACCCACGCGGCTCTGCGCGTCAGTCGGCGTGAACCCGACCGGCGGGCTCGGGTGCTGGCCG 
30 CGGCCGAGTACATGGCCACCGGCCTGGCACGGCAGGGCTATCAGGCCGAGTATCACGGAACC 
GCGATCGTGCCGGTGATCCTGGGCAACCCGACCGTGGCGCATGCGGGCTATCTGCGGCTGAT 
GCGCTCCGGGGTGTATGTGAACCCGGTGGCCCCCCCAGCCGTGCCGGAGGAGCGTTCGGGAT 
TCCGCACCAGCTACCTAGCCGACCACCGACAATCTGACCTCGACCGGGCCTTGCACGTGTTTGC 
CGGCCTTGCCGAGGACCTGACCCCGCAAGGAGCCGCGCTATGA 

35 

>Rv0050 ponAI TB.seq 53661:55694 MW:71119 
>emb|AL123456|MTBH37RV:53661-55697. ponA SEQ ID NO:12 
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GTGGTGATCCTGTTGCCGATGGTCACCTTCACGATGGCCTACCTGATCGTCGACGTTCCCAAGC 

CAGGTGACATCCGTACCAACCAGGTCTCCACGATCCTTGCCAGCGACGGCTCGGAAATCGCCA 

AAATTGTTCCGCCCGAAGGTAATCGGGTCGACGTCAACCTCAGCCAGGTGCCGATGCATGTGC 

GCCAGGCGGTGATTGCGGCCGAAGACCGCAATTTCTATTCGAATCCGGGATTCTCGTTCACCGG 

CTTCGCGCGGGCAGTCAAGAACAACCTGTTCGGCGGCGATCTGCAGGGCGGATCGACGATTAC 

CCAGCAGTACGTCAAGAACGCGCTGGTCGGTTCCGCACAGCACGGGTGGAGCGGTCTGATGC 

GCAAGGCGAAAGAATTGGTCATCGCGACGAAGATGTCGGGGGAGTGGTCTAAAGACGATGTGC 

TGCAGGCGTATCTGAACATCATCTACTTCGGCCGGGGCGCCTACGGCATTTCGGCGGCGTCCA 

AGGCTTATTTCGACAAGCCCGTCGAGCAGCTGACCGTTGCCGAAGGGGCGTTGTTGGCAGCGC 

TGATTCGGCGGCCTTCGACGCTGGACCCGGCGGTCGACCCCGAAGGGGCCCATGCCCGCTGG 

AATTGGGTACTCGACGGCATGGTGGAAACXJAAGGCTCTCTCGCCGAATGACCGTGCGGCGCAG 

GTGTTTCCCGAGACAGTGCCGCCCGATCTGGCCCGGGCAGAGAATCAGACCAAAGGACCCAAC 

GGGCTGATCGAGCGGCAGGTGACAAGGGAGTTGCTCGAGCTGTTCAACATCGACGAGCAGACC 

CTCAACACCCAGGGGCTGGTGGTCACCACCACGATTGATCCGCAGGCCCAACGGGCGGCGGA 

GAAGGCGGTTGCGAAATACCTGGACGGGCAGGACCCCGACATGCGTGCCGCCGTGGTTTCCAT 

CGACCCGCACAACGGGGCGGTGCGTGCGTACTACGGTGGCGACAATGCCAATGGCTTTGACTT 

CGCTCAAGCGGGATTGCAGACTGGATCGTCGTTTAAGGTGTTTGCTCTGGTGGCCGCCCTTGAG 

CAGGGGATCGGCCTGGGCTACCAGGTAGACAGCTCTCCG1TGACGGTCGACGGCATCAAGATC 

ACCAACGTCGAGGGCGAGGGTTGCGGGACGTGCAACATCGCCGAGGCGCTCAAAATGTCGCT 

GAACACCTCCTACTACCGGCTGATGCTCAAGCTCAACGGCGGCCCACAGGCTGTGGCCGATGC 

CGCGCACCAAGCCGGCATTGCCTCCAGCTTCCCGGGCGTTGCGCACACGCTGTCCGAAGATGG 

CAAGGGTGGACCGCCCAACAACGGGATCGTGTTGGGCCAGTACCAAACCCGGGTGATCGACAT 

GGCATCGGCGTATGCCACGTTGGCCGCGTCCGGTATCTACCACCCGCCGCATTTCGTACAGAA 

GGTGGTCAGTGCCAACGGCCAGGTCCTCTTCGACGCCAGCACCGCGGACAACACCGGCGATCA 

GCGCATCCCCAAGGCGGTAGCCGACAACGTGACTGCGGCGATGGAGCCGATCGCAGGTTATTC 

GCGTGGCCACAACCTAGCGGGTGGGCGGGATTCGGCGGCCAAGACCGGCACTACGCAATTTG 

GTGACACCACCGCGAACAAAGACGCCTGGATGGTCGGGTACACGCCGTCGTTGTCTACGGGTG 

TGTGGGTGGGCACCGTCAAGGGTGACGAGCCACTGGTAACCGCTTCGGGTGCAGCGATTTACG 

GCTCGGGCCTGCCGTCGGACATCTGGAAGGCAACCATGGACGGCGCCTTGAAGGGCACGTCG 

AACGAGACTTTCCCCAAACCGACCGAGGTCGGTGGTTATGCCGGTGTGCCGCCGCCGCCGCCG 

CCGCCGGAGGTACCACCTTCGGAGACCGTCATCCAGCCCACGGTCGAAATTGCGCCGGGGATT 

ACCATCCCGATCGGTCCCCCGACCACCATTACCCTGGCGCCACCGCCCCCGGCCCCGCCCGCT 

GCGACTCCCACGCCGCCGCCGTGA 

>Rv0051 -TB.seq 55694:57373 MW:61 210 

>emb|AL123456|MTBH37RV:55694-57376. RvOOSI SEQ ID NO: 13 
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GTGACCGGCGCGCTGTCCCAAAGCAGCAACATCTCGCCACTTCCTTTGGCCGCCGATCTGCGG 
AGCGCCGATAACCGCGATTGCCCCAGCCGCACCGACGTATTGGGTGCCX3CTCTGGCGAATGTC 
GTCGGTGGCCCGGTAGGCCGGCACGCGCTGATCGGCCGCACCCGGCTGATGACCCCGCTGCG 
GGTGATGTTTGCAATCGCGTTGGTGTTCCTGGCGCTCGGTTGGTCGACGAAAGCGGCCTGCTT 

5 GCAGTCCACXJGGAACCGGTCCAGGTGATCAGCGGGTGGCCAACTGGGATAACCAGCGTGCTTA 
CTACCAGTTGTGCTACTCCGATAGGGTGCCGCTCTATGGCGCTGAGTTATTGAGCCAAGGCAAG 
TTTCCGTACAAATCAAGCTGGATCGAAACCGACAGCAACGGCACACCGCAGCTGCGCTACGAC 
GGACAGATCGCGGTGCGCTATATGGAGTATCCGGTGCTGACTGGGATCTATCAGTACCTGTCGA 
TGGCGATAGCCAAGACCTACACCGCGTTAAGCAAGGTGGCTCCCCTCCCGGTGGTTGCCGAAG 

1 0 TGGTGATGTTCTTCAACGTCGCCGCGTTCGGTTTGGCGCTGGGGTGGCTGACAACCGTCTGGG 
CGACCTCGGGCCTGGCCGGCCGCCGGATATGGGATGCGGCGCTGGTGGCCGCCTCACCGCTG 
GTGATCTTTCAGATATTCACCAATTTCGATGCGCTGGCAACGGGTTTGGCGACGAGTGGGCTGC 
TGGCCTGGGCGCGGCGCAGACCGGTGCTTGCCGGTGTGCTGATCGGGTTGGGCTCCGCGGCG 
AAACTGTATCCGCTGTTGTTCTTGTACCCGTTGTTGCTGCTGGGCATCCGGGCCGGTCGCCTGA 

1 5 ATGCTCTGGCCCGCACCATGGCGGCCGCGGCGGCGACCTGGTTGTTGGTGAATCTGCCGGTGA 
TGCTGCTCTTTCCGCGCGGCTGGTCGGAGTTCTTCCGGCTCAACACCCGGCGCGGCGACGACA 
TGGACTCGTTGTACAACGTCGTCAAGTCGTTCACCGGCTGGCGTGGCTTCGACCCCACCCTGG 
GCTTCTGGGAGCCGCCGCTGGTGCTGAACACGGTTGTCACGCTCTTGTTCGTGTTATGTTGTGC 
GGCAATTGCTTACATCGCGCTCACCGCACCCCACCGGCCGCGCGTGGCGCAGCTGACTTTCTT 

20 GACGGTGGCCAGCTTCCTGTTGGTCAACAAGGTGTGGAGTCCCCAGTTCTCGCTTTGGCTGGTG 
CCGCTGGCCGTGCTGGCTTTGCCGCACCGCCGGATCTTGCTGGCGTGGATGACGATCGACGCG 
TTGGTGTGGGTGCCGCGGATGTACTACCTATACGGCAACCCGAGCCGCTCGCTGCCCGAGCAG 
TGGTTCACCACGACGGTGTTGCTGCGTGACATCGCCGTGATGGTGCTGTGCGGACTGGTGGTC 
TGGCAGATCTACCGCCCCGGGCGCGACCTCGTGCGTACCGGCGGGCCAGGGGCACTGCCGGC 

25 TTGTGGGGGAGTCGACGACCCGGTGGGAGGGGTCTTTGCCAACGCCGCCGACGCCCCGCCAG 
GTCGGCTACCGTCGTGGCTGCGTCCCCGGCTGGGCGACGAGCATGCGCGAGAGAGGACGCCC 
GATGCAGGTCGCGATCGCACTTTTTCCGGGCAACACCGCGCTTGA 

>Rv0106 -TB.seq 124372:125565 MW:43701 

30 >emb|AL123456|MTBH37RV:1 24372-125568. Rv0106 SEQ ID NO:14 

ATGCGTACTCCGGTGATATTGGTGGCAGGTCAGGATCACACCGACGAGGTGACGGGCGCCTTG 
TTGCGCCGGACCGGAACGGTGGTCGTGGAGCACCGGTTTGACX3GCCATGTGGTGCGACGGAT 
GACTGCCACGCTGAGCCGTGGCGAATTGATCACCACGGAGGACGCTTTGGAGTTCGCCCACGG 
CTGTGTGTCGTGCACAATCCGCGACGAGCTGCTGGTGCTGTTACGCAGACTGCACCGCCGAGA 

35 CAATGTCGGCCGGATCGTCGTGCACCTGGCGCCGTGGCTGGAGCCCCAGCCCATCTGCTGGG 
CGATCGACCACGTGCGGGTTTGCGTCGGACACGGATACCCAGACGGACCAGCCGCCCTCGAC 
GTGCGGGTCGCGGCCGTGGTGACCTGTGTGGACTGCGTAAGGTGGCTGCCGCAGTCACTCGG 
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CGAGGACGAACTGCCCGACGGGCGCACGGTGGCCCAAGTGACGGTCGGTCAGGCCGAGTTCG 

CCGACCTTCTGGTGCTGACCCACCCGGAACCGGTCGCCGTGGCGGTTCTGCGCCGACTGGCC 

CCTCGAGCGCGAATCACCGGCGGCGTCGACCGCGTCGAGCTGGCGCTGGCGCATCTGGACGA 

CAACTCACGGAGGGGTCGTACCGATACCCCGCACACGCCATTGCTGGCGGGCCTGCCTCCGTT 

GGCAGCCGACGGTGAGGTTGCGATCGTGGAATTCAGTGCCCGCCGCCCGTTTCACCCGCAACG 

TCTGCATGCCGCGGTTGACCTGCTGCTCGATGGCGTGGTTCGCACTCGAGGTCGGCTGTGGCT 

GGCCAACCGGCCGGATCAGGTCATGTGGCTCGAATCAGCCGGTGGCGGTCTGCGGGTCGCAT 

CGGCCGGAAAGTGGTTGGCGGCGATGGCGGCCTCGGAGGTGGCCTATGTCGACCTGGAGCGG 

CGGTTGTTCGCCGACCTGATGTGGGTCTACCCGTTCGGAGACCGGCACACCGCGATGACGGTA 

CTGGTATGCGGCGCCGATCCGACCGACATCGTCAATGCCCTGAACGCGGCGCTGGTCAGCGAC 

GACGAAATGGCATCTCCGCAACGCTGGCAGTCCTACGTCGACCCTTTCGGCGACTGGCATGAC 

GACCCGTGCCACGAAATGCCCGATGCGGCTGGGGAATTCTCGGCACACCGCAACTCAGGAGAA 

TCTCGATGA 

>Rv01 25 - TB.seq 1 51 1 46: 1 5221 0 MW:34927 

>emb|AL123456|MTBH37RV:151146-152213. pepA SEQ ID NO:15 

ATGAGCAATTCGCGCCGCCGCTCACTCAGGTGGTCATGGTTGCTGAGCGTGCTGGCTGCCGTC 

GGGCTGGGCCTGGCCACGGCGCCGGCCCAGGCGGCCCCGCCGGCCTTGTCGCAGGACCGGT 

TCGCCGACTTCCCCGCGCTGCCCCTCGACCCGTCCGCGATGGTCGCCCAAGTGGGGCCACAG 

GTGGTCAACATCAACACCAAACTGGGCTACAACAACGCCGTGGGCGCCGGGACCGGCATCX3TC 

ATCGATCCCAACGGTGTCGTGCTGACCAACAACCACGTGATCGCGGGCGCCACCGACATCAAT 

GCGTTCAGCGTCGGCTCCGGCCAAACCTACGGCGTCGATGTGGTCGGGTATGACCGCACCCAG 

GATGTCGCGGTGCTGCAGCTGCGCGGTGCCGGTGGCCTGCCGTCGGCGGCGATCGGTGGCG 

GCGTCGGGGTTGGTGAGCCCGTCGTCGCGATGGGCAACAGCGGTGGGCAGGGCGGAACGCC 

CCGTGCGGTGCCTG6CAGGGTGGTCGCGCTCGGCCAAACCGTGCAGGCGTCGGATTCGCTGA 

CCGGTGCCGAAGAGACATTGAACGGGTTGATCCAGTTCGATGCCGCGATCCAGCCCGGTGATT 

CGGGCGGGCCCGTCGTCAACGGCCTAGGACAGGTGGTCGGTATGAACACGGCCGCGTCCGAT 

AACTTCCAGCTGTCCCAGGGTGGGCAGGGATTCGCCATTCCGATCGGGCAGGCGATGGCGATC 

GCGGGCCAGATCCGATCGGGTGGGGGGTCACCCACCGTTCATATCGGGCCTACCGCCTTCCTC 

GGCTTGGGTGTTGTCGACAACAACGGCAACGGCGCACGAGTCCAACGCGTGGTCGGGAGCGC 

TCCGGCGGCAAGTCTCGGCATCTCCACCGGCGACGTGATCACCGCGGTCGACGGCGCTCCGAT 

CAACTCGGCCACCGCGATGGCGGACGCGCTTAACGGGCATCATCCCGGTGACGTCATCTCGGT 

GACCTGGCAAACCAAGTCGGGCGGCACGCGTACAGGGAACGTGACATTGGCCGAGGGACCCC 

CGGCCTGA 

>Rv0350 dnaK 70 kD heat shock protein, chromosome replication TB.seq 419833:421707 
MW:66832 SEQ ID NO: 16 
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>emb|AL123456|MTBH37RV:419833-421710. dnaK 

ATGGCTCGTGCGGTCGGGATCGACCTCGGGACCACCAACTCCGTCGTCTCGGTTCTGGAAGGT 

GGCGACCCGGTCGTCGTCGCCAACTCCGAGGGCTCCAGGACCACCCCGTCAATTGTCGCGTTC 

GCCCGCAACGGTGAGGTGCTGGTCGGCCAGCCCGCCAAGAACCAGGCAGTGACCAACGTCGA 

TCGCACCGTGCGCTCGGTCAAGCGACACATGGGCAGCGACTGGTCCATAGAGATTGACGGCAA 

GAAATACACCGCGCCGGAGATCAGCGCCCGCATTCTGATGAAGCTGAAGCGCGACGCCGAGGC 

CTACCTCGGTGAGGACATTACCGACGCGGTTATCACGACGCCCGCCTACTTCAATGACGCCCAG 

CGTCAGGCCACCAAGGACGCCGGCCAGATCGCCGGCCTCAACGTGCTGCGGATCGTCAACGA 

GCCGACCGCGGCCGCGCTGGCCTACGGGCTCGACAAGGGCGAGAAGGAGCAGCGAATCCTGG 

TCTTCGACTTGGGTGGTGGCACTTTCGACGTTTCCCTGCTGGAGATCGGCGAGGGTGTGGTTGA 

GGTCCGTGCCACTTCGGGTGACAACCACCTCGGCGGCGACGACTGGGACCAGCGGGTCGTCG 

ATTGGCTGGTGGACAAGTTCAAGGGCACCAGCGGCATCGATCTGACCAAGGACAAGATGGCGA 

TGCAGCGGCTGCGGGAAGCCGCCGAGAAGGCAAAGATCGAGCTGAGTTCGAGTCAGTCCACCT 

CGATCAACCTGCCCTACATCACCGTCGACGCCGACAAGAACCCGTTGTTCTTAGACGAGCAGCT 

GACCCGCGCGGAGTTCCAACGGATCACTCAGGACCTGCTGGACCGCACTCGCAAGCCGTTCCA 

GTCGGTGATCGCTGACACCGGCATTTCGGTGTCGGAGATCGATCACGTTGTGCTCGTGGGTGG 

TTCGACCCGGATGCCCGCGGTGACCGATCTGGTCAAGGAACTCACCGGCGGCAAGGAACCCAA 

CAAGGGCGTCAACCCCGATGAGGTTGTCGCGGTGGGAGCCGCTCTGCAGGCCGGCGTCCTCA 

AGGGCGAGGTGAAAGACGTTCTGCTGCTTGATGTTACCCCGCTGAGCCTGGGTATCGAGACCA 

AGGGCGGGGTGATGACCAGGCTCATCGAGCGCAACACCACGATCCCCACCAAGCGGTCGGAG 

ACTTTCACCACCGCCGACGACAACCAACCGTCGGTGCAGATCCAGGTCTATCAGGGGGAGCGT 

GAGATCGCCGCGCACAACAAGTTGGTCGGGTCCTTCGAGCTGACCGGCATCCCGCCGGCGCC 

GCGGGGGATTCCGCAGATCGAGGTCACTTTCGACATCGACGCCAACGGCATTGTGCACGTCAC 

CGCCAAGGACAAGGGCACCGGCAAGGAGAACACGATCCGAATCCAGGAAGGCTCGGGCCTGT 

CCAAGGAAGACATTGACCGCATGATCAAGGACGCCGAAGCGCACGCCGAGGAGGATCGCAAGC 

GTCGGGAGGAGGCCGATGTTCGTAATCAAGCCGAGACATTGGTCTACCAGACGGAGAAGTTCG 

TCAAAGAACAGCGTGAGGCCGAGGGTGGTTCGAAGGTACCTGAAGACACGCTGAACAAGGTTG 

ATGCCGCGGTGGCGGAAGCGAAGGCGGCACTTGGCGGATCGGATATTTCGGCCATCAAGTCG 

GCGATGGAGAAGCTGGGCCAGGAGTCGCAGGCTCTGGGGCAAGCGATCTACGAAGCAGCTCA 

GGCTGCGTCACAGGCGACTGGCGCTGCCCACCCCGGCGGCGAGCCGGGCGGTGCCCACCCC 

GGCTCGGCTGATGACGTTGTGGACGCGGAGGTGGTCGACGACGGCCGGGAGGCCAAGTGA 

>Rv0351 grpE stimulates DnaK ATPase activity TB.seq 421707:42241 1 MW:24501 
>emb|AL123456iMTBH37RV:421 707-42241 4. grpE SEQ ID NO:17 

GTGACGGACGGAAATCAAAAGCCGGATGGCAATTCGGGCGAACAGGTAACCGTCACTGACAAG 

CGGCGGATCGATCCCGAGACGGGTGAAGTGCGGCACGTCCCTCCCGGCGACATGCCGGGAGG 

GACGGCTGCGGCCGATGCGGCGCACACCGAAGACAAGGTCGCCGAGCTGACCGCCGATCTGC 
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AACGCGTGCAGGCCGACTTCGCCAACTACCGTAAGCGGGCGTTGCGCGATCAGCAGGCGGCC 

GCTGACCGAGCCAAGGCCAGCGTTGTCAGCCAATTGCTGGGTGTACTGGACGATCTCGAGCGG 

GCGCGCAAGCACGGCGATTTGGAGTCGGGTCCACTGAAGTCGGTCGCCGACAAGCTAGACAGC 

GCGTTGACCGGGCTGGGTCTGGTGGCGTTCGGTGCCGAGGGCGAGGATTTCGACCCCGTGCT 

GCACGAAGCGGTGCAACACGAGGGCGACGGCGGGCAGGGGTCCAAGCCGGTAATCGGCACC 

GTCATGCGGCAGGGCTACCAACTGGGTGAGCAGGTGCTGCGGCACGCCTTGGTCGGCGTCGT 

CGACACGGTGGTCGTCGACGCGGCCGAACTGGAGTCAGTCGACGACGGCACTGCGGTCGCAG 

ATACCGCCGAAAACGATCAAGCTGACCAGGGCAATAGCGCCGACACCTCGGGCGAACAGGCAG 

AATCAGAACCGTCGGGCAGTTAA 

>Rv0352 dnaJ acts with GrpE to stimulate DnaK ATPase TB.seq 422450:423634 IV^W:41 346 
>emb|AL123456|MTBH37RV:42245(M23637. dnaJ SEQ ID NO:18 

ATGGCCCAAAGGGAATGGGTCGAAAAAGACTTCTACCAGGAGCTGGGCGTCTCCTCTGATGCC 

AGTCCTGAAGAGATCAAACGTGCCTATCGGAAGTTGGCGCGCGACCTGCATCCGGACGCGAAC 

CCGGGCAACCCGGCCGCCGGCGAACGGTTCAAGGCGGTTTCGGAGGCGCATAACGTGCTGTC 

GGATCCGGCCAAGCGCAAGGAGTACGACGAAACCCGCCGCCTGTTCGCCGGCGGCGGGTTCG 

GCGGCCGTCGGTTCGACAGCGGCTTTGGGGGCGGGTTCGGCGGTTTCGGGGTCGGTGGAGAC 

GGCGCCGAGTTCAACCTCAACGACTTGTTCGACGCCGCCAGCCGAACCGGCGGTACCACCATC 

GGTGACTTGTTCGGTGGCTTGTTCGGACGCGGTGGCAGCGCCCGTCCCAGCCGCCCGCGACG 

CGGCAACGACCTGGAGACCGAGACCGAGTTGGATTTCGTGGAGGCCGCCAAGGGCGTGGCGA 

TGCCGCTGCGATTAACCAGCCCGGCGCCGTGCACCAACTGCCATGGCAGCGGGGCCCGGCCA 

GGCACCAGCCCAAAGGTGTGTCCCACTTGCAACGGGTCGGGCGTGATCAACCGCAATCAGGGC 

GCGTTCGGCTTCTCCGAGCCGTGCACCGACTGCCGAGGTAGCGGCTCGATCATCGAGCACCCC 

tgcgaggagtgcaaaggcaccggcgtgaccacccgcacccgaaccatcaacgtgcggatccc 
gcccggtgtcgaggatgggcagcgcatccggctagccggtcagggcgaggccgggttgcgc 
ggcgctccctcgggggatctctacgtgacggtgcatgtgcggcccgacaagatcttcggccgc 

GACGGCGACGACCTCACCGTCACCGTTCCGGTCAGCTTCACCGAATTGGCTTTGGGCTCGACG 

ctgtcggtgcctaccctggacggcacggtcggggtccgggtgcccaaaggcaccgctgacgg 
ccgcattctgcgtgtgcgcggacgcggtgtgcccaagcgcagtgggggtagcggcgacctac 
ttgtcaccgtgaaggtggccgtgccgcccaatttggcaggcgccgctcaggaagctctggaag 
cctatgcggcggcggagcggtccagtggtttcaacccgcgggccggatgggcaggtaatcgc 

TGA 

>Rv0363c fba fructose bisphosphate aldolase TB.seq 441266:442297 MW:36545 
>embIAL123456|MTBH37RV:c442297-441263. fba SEQ ID NO: 19 

ATGCCTATCGCAACGCCCGAGGTCTACGCGGAGATGCTCGGTCAGGCCAAACAAAACTCGTAC 
GCTTTCCCGGCTATCAACTGCACCTCCTCGGAAACCGTCAACGCCGCGATCAAAGGTTTCGCCG 
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ACGCCGGCAGTGACGGAATCATCCAGTTCTCGACCGGTGGCGCAGAATTCGGCTCCGGCCTCG 

GGGTCAAAGACATGGTGACCGGTGCGGTCGCCTTGGCGGAGTTCACCCACX3TTATCGCGGCCA 

AGTACCCGGTCAACGTGGCGCTGCACACCGACCACTGCCCCAAGGACAAGTTGGACAGCTATG 

TCCGGCCCTTGCTGGCGATCTCGGCGCAACGCGTGAGCAAAGGTGGCAATCCTTTGTTCCAGT 

CGCACATGTGGGACGGCTCGGCAGTGCCAATCGATGAGAACCTGGCCATCGCCCAGGAGCTGC 

TCAAGGCGGCGGCGGCCGCCAAGATCATTCTGGAGATCGAGATCGGCGTCGTCGGCGGCGAA 

GAGGACGGCGTGGCGAACGAGATCAACGAGAAGCTGTACACCAGCCCGGAGGACTTCGAGAAA 

ACCATCGAGGCGCTGGGCGCCGGTGAGCACGGCAAATACCTGCTGGCCGCGACGTTCGGCAA 

CGTGCATGGCGTCTACAAGCCCGGCAACGTCAAGCTTCGCCCCGACATCCTTGCGCAAGGGCA 

ACAGGTGGCGGCGGCCAAGCTCGGACTGCCGGCCGACGCCAAGCCGTTCGACTTCGTGTTCC 

ACGGCGGCTCGGGTTCGCTTAAGTCGGAGATCGAGGAGGCGCTGCGCTACGGCGTGGTGAAG 

ATGAACGTCGACACCGACACCCAGTACGCGTTCACCCGCCCGATCGCCGGTCAGATGTTCACC 

AACTAGGACGGAGTGCTCAAGGTCGATGGCGAGGTGGGTGTCAAGAAGGTCTACGACCCGCGC 

AGCTACCTCAAGAAGGCCGAAGCTTCGATGAGCCAGCGGGTCGTTCAGGCGTGCAATGACCTG 

CACTGCGCCGGAAAGTGCCTAACCCACTAA 

>Rv0405 pks6 TB.seq 485729:489934 MW:147615 >emb|AL123456lMTBH37RV:485729-489937. 
pks6 SEQIDNO:20 

ATGACAGACGGTTCGGTCACTGCGGATAAGCTTCAAAAATGGTTTCGAGAGTACTTGTCCACGC 

ATATCGAGTGTCATCCAAATGAGGTCAGCCTAGACGTTCCGATTAGAGATTTAGGTTTGAAATCG 

ATTGATGTCTTAGCGATTCCCGGCGAGCTCGGTGACAGATTTGGGTTTTGTATTCCCGATTTGGC 

CGTTTGGGATAATCCTAGCGCTAATGATTTGATTGATAGTCTGTTGAACCAGCGTAGTGCTGACT 

CGTTAAGAGAGAGTCATGGACACGCCGACAGGAACACGCAGGGTCGGGGCAGCATAAACGAGC 

CGGTTGCGGTCATCGGAGTGGGCTGTCGATTTCCGGGAGATATTGACGGCCCGGAACGGCTAT 

GGGACTTTCTGACCGAGAAGAAGTGTGCGATAACAGCGTATCCAGATCGTGGGTTCACGAATGC 

TGGAACTTTCGCGGAGTCCGGAGGCTTTTTAAAGGATGTCGCGGGTTTCGATAATAGATTTTTTG 

ATATCCCGCCGGACGAGGCTCTGCGAATGGATCCGCAACAACGGTTGTTACTGGAGGTCTCTTG 

GGAAGCGTTAGAGCATGCAGGAATTATTCCTGAGTCATTAAGACTTTCACGTACGGGCGTATTC 

GTTGGGGTGTCGTCAACTGACTACGTCCGGCTTGTGTCAGCTAGCGCTCAGCAAAAGTCTACTA 

TTTGGGATAACACCGGCGGTTCTTCGAGTATTATTGCCAATAGAATCTCATACTTTCTCGATATTC 

AGGGTCCGTCCATTGTCATTGACACGGCATGCTCGTCATCCCTGGTCGCCGTGCATCTAGCCTG 

TCGAAGTCTCAGTACCTGGGACTGCGATATCGCACTTGTCGGTGGGACGAATGTTCTTATTTCAC 

CAGAACCATGGGGTGGGTTTAGGGAAGCGGGCATCTTGTCGCAGACAGGCTGCTGTCACGCGT 

TCGATAAATCCGCCGACGGGATGGTACGCGGTGAGGGATGCGGAGTTATCGTGCTGCAGCGCC 

TCAGTGATGCACGCCTTGAGGGCCGGCGGATATTAGCGATTCTGACGGGTTCAGGGGTCAATC 

AGGACGGTAAGTCCAACGGTATTATGGCGCCAAATCCTAGTGCGCAAATTGGTGTTCTTGAAAAT 

GCATGCAAGAGCGCTCGCGTCGATCCGCTGGAAATCGGCTACGTCGAGGCCCACGGGACCGG 
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AACGTCGTTAGGGGATAGGATCGAGGCGCACGCCTTAGGCATGGTCTTTGGTCGCAAGAGACC 

GGGATCTGGGCCCCTGATGATCGGGAGCATCAAGCCGAATATCGGCCATCTGGAAGGTGCGGC 

TGGCATCGCCGGATTGATCAAGGCGGTGTTGATGGTTGAGCGTGGCTCGCTGCTTCCGAGCGG 

GGGGTTTACGGAGCCAAATCCAGCTATCCCATTCACGGAATTGGGCCTGAGAGTTGTAGACGAA 

CTTCAGGAGTGGCCGGTGGTGGCGGGTCGGCCGCGCCGGGCTGGGGTGTCATCGTTCGGCTT 

TGGCGGCACCAATGCGCATGTGATTGTCGAGGAAGCTGGTTCGGTTGGGGCGGACACGGTTTC 

GGGCCGCGCGGATGTTGGCGGTTCCGGTGGTGGGGTGGTGGCGTGGGTGATTTCGGGGAAGA 

CGGCTTCGGCGTTGGCTGCTCAGGCGGGTCGGTTGGGGCGGTATGTGCGGGCTCGGCCGGCG 

CTTGATGTTGTTGATGTGGGGTATTCGTTGGTGAGCACGCGGTCGGTGTTTGATCATCGGGCGG 

TGGTGGTCGGCCAGACTCGCGATGAGTTGCTGGCTGGGTTGGCTGGGGTGGTTGCTGGTCGG 

CCGGAGGCTGGGGTGGTCTGCGGTGTTGGCAAGCCGGCGGGCAAGACGGCTTTTGTGTTTGC 

CGGTCAGGGCTCGCAGTGGCTGGGTATGGGTAGCGAGCTTTATGCTGCCTACCCGGTTTTCGC 

CGAGGCCCTCGATGCTGTGGTGGACGAGTTGGACCGGCACCTGCGGTATCCGCTGCGCGATGT 

GATCTGGGGGCACGACCAAGATCTGTTGAATACCACCGAATTCGCCCAGCCGGCGCTGTTTGC 

GGTGGAGGTGGCGCTGTATCGGCTGCTCATGTCGTGGGGGGTGCGGCCGGGTTTGGTGCTGG 

GTCATTCGGTGGGCGAGTTGGCCGCGGCGCACGTCGCCGGGGCGCTGTGTTTGCCGGATGGG 

GCGATGGTGGTGGCCGCGCGTGGACGGTTGATGCAGGCGTTGCCCGCCGGCGGCGCCATGTT 

TGCGGTGCAGGCCCGTGAAGACGAGGTAGCGCCGATGCTGGGGCACGATGTGAGCATCGCGG 

CGGTCAATGGTCCGGCTTCGGTGGTGATCTCTGGTGCCCACGATGCGGTGAGCGCGATCGCTG 

ATCGGCTGCGCGGCCAGGGCCGTCGGGTCCACCGGTTGGCGGTCTCGCATGCCTTTCACTCG 

GCGTTGATGGAGCCGATGATCGCTGAGTTCACAGCCGTTGCGGCCGAACTGTCTGTGGGCTTG 

CCCACGATCCCGGTCATTTCCAATGTGACCGGGCAGTTGGTGGGCGACGACTTCGCCTCAGCT 

GATTACTGGGCCCGGCATATCCGGGCGGTGGTGCGGTTTGGCGACAGTGTTCGTAGTGCCCAC 

TGCGCCGGTGCCAGTCGTTTCATCGAAGTCGGGCCCGGTGGCGGCTTGACGTCGTTGATCGAG 

GCATCGCTGGCCGACGCGCAGATCGTGTCGGTGCCCACGCTGCGCAAAGATCGGCCCGAACC 

GGTCAGTGTGATGACGGCGGCGGCCCAGGGCTTCGTCTCGGGGATGGGCCTGGATTGGGCCT 

CGGTGTTTTCCGGGTACCGGCCCAAGCGGGTGGAGTTGCCGACGTATGCCTTCCAGCATCAAA 

AGTTCTGGCTCGCACCAGCCCCATCGGTCAGCGACCCCACCGCCGCCGGCCAGATCGGGGCT 

AGCGATGGTGGTGCTGAACTCTTGGCGTCCTCCGGGTTTGCCGCCCGGCTGGCCGGTCGGTCG 

GCCGACGAGCAACTCGCCGCAGCGATCGAGGTGGTATGTGAGCATGCCGCAGCGGTGCTGGG 

GCGCGACGGCGCTGCCGGACTCGACGCTGGCCAGGCGTTTGCCGATTCGGGATTTAATTCCTT 

GAGTGCCGTGGAGCTACGTAACCGCTTAACAGGCGTCACCGCAGTAACGCTGCCGGCCACCGC 

GATCTTCGATCACCCCACCCCGACCGAACTAGCCCAGTATCTGATCACCCAAATAGACGGTCAC 

GGCAGCTCCGCCGCCGCAGCGGCAAACCCGGCGGAGCGAATCGATGCGCTCACCGATC mil 

CTACAAGCTTGCGATGCGGGTCGGGATGCCGATGGTTGGAAGATGGTCGCCCTGGCGTCGAAT 

ACGCGCGAGCGCATGAGCTCACCGGTTCGGAACAACGTATCGAAGAACGTCGCACTGCTGGCA 

GATGGTATCTCCGATGTGGTTGTAATTTGTATCCCAACTCTAACTGTGCTATCGGATCAGCGTGA 
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ATATCGAGATATTGCGAATGCGATGACAGGCCGCCATTCGGTTTATTCGCTTACGCTTCCCGGG 

TTCGATTCGTCTGATGCACTGCCGCAAAACGCGGATATGATTGTTGAAACCGTATCTAACGCAAT 

TATTGATGTGGTAGGCGGCAGCTGCCGTTTTGTGCTGTCGGGCTATTCATCGGGTGGGGTGTTG 

GCCTATGCCCTCTGCTCCCATCTGTCGGTCAAGCACCAGCGGAATCCCCTCGGAGTCGCACTCA 

TCGATACATATCTGCCTAGTCAGATCGCCAATCCTTCAATGAATGAAGGGTTCAGCCCCAACGAT 

ACTGGGAAGGGCCTTTCCCGTGAAGTAATTCGAGTGGCCAGAATGTTGAATCGGTTAACTGCCA 

CCCGACTCACCGCGGCAGCCACCTATGCTGCAATCTTTCAGGCCTGGGAACCAGGTAGATCAAT 

GGCTCCGGTTCTTAACATCGTGGCGAAGGACCGAATAGCTACCGTCGAAAATTTACGCGAAGAA 

CGAATCAACCGGTGGCGAACTGCTGCTGCAGAGGCGGCCTATTCTGTAGCCGAAGTACCCGGG 

GATCATTTCGGAATGATGAGCACCTCGAGTGAGGCAATAGCTACCGAAATACATGATTGGATTTC 

TGGGCTCGTTCGAGGGCCTCATCGGTAG 

>Rv0435c- ATPase of AAA-familyTB.seq 522348:524531 MW:75315 
>emb|AL123456|MTBH37RV:c524531-522345. Rv0435c SEQ ID NO:21 

GTGACCCACCCGGACCCGGCCCGCCAACTCACCCTTACCGCCCGGCTGAACACCTCGGCCGTC 

GACTCACGCCGCGGCGTCGTTCGGTTGCACCCCAATGCCATTGCTGCCCTTGGCATCCGCGAG 

TGGGACGCGGTGTCGCTGACCGGCTCTCGGACAACCGCCGCGGTCGCCGGCCTGGCCGCGGC 

AGACACCGCGGTCGGGACGGTGCTGCTCGATGACGTCACACTGTCCAATGCX3GGCCTTCGCGA 

AGGCACCGAGGTGATCGTCAGCCCGGTCACCGTCTACGGAGCGCGATCGGTGACGCTGAGCG 

GTTCAACGCTGGCCACCCAGTCGGTGCCGCCGGTCACGCTGCGGCAGGCCCTACTCGGCAAG 

GTGATGACCGTCGGTGACGCGGTCTCGCTGCTGCCCCGCGATCTAGGCCCCGGCACATCCACG 

TCGGCTGCCAGCCGCGCATTGGCAGCTGCGGTCGGGATCAGTTGGACCTCGGAGCT6CTGACC 

GTTACCGGCGTCGACCCCGACGGGCCGGTCAGCGTGCAGCCCAACTCGCTGGTCACCTGGGG 

CGCTGGGGTCCCGGCCGCAATGGGTACGTCCACGGCCGGGCAAGTGAGCATCTCGAGTCCGG 

AGATCCAGATCGAAGAGCTCAAGGGCGCCCAGCCGCAGGCTGCCAAGCTCACCGAATGGCTCA 

AGCTTGCCCTCGATGAGCCGCACCTACTACAGACCTTGGGCGCCGGCACCAATTTGGGTGTGC 

TGGTGTCGGGTCCGGCCGGGGTGGGCAAGGCGACGCTGGTGCGCGCGGTGTGCGACGGCCG 

AAGGTTGGTGACACTGGATGGTCCGGAGATTGGAGCTCTGGCCGCCGGAGACCGGGTCAAAGC 

CGTGGCCTCGGCAGTGCAGGCGGTTCGCCATGAGGGCGGTGTGTTGCTGATCACCGATGCCGA 

CGCCCTGCTGCCAGCCGCCGCCGAGCCGGTAGCCTCGCTGATCCTGTCCGAGCTGCGTACCG 

CGGTGGCCACCGCCGGTGTGGTATTGATCGCCACCTCAGCACGGCCCGATCAACTCGATGCCC 

GGCTGCGTTCCCCCGAGTTGTGCGACCGGGAGCTTGGCCTGCCGCTGCCCGACGCGGCCACC 

CGCAAATCGCTGCTGGAGGCGCTGCTGAATCCGGTTCCTACCGGAGACCTCAACCTCGACGAA 

ATCGCCTCCCGCACACCGGG-nTCGTCGTGGCCGACCTGGCTGCGCTGGTTCGCGAGGCGGC 

GCTGCGGGCAGCGTCTCGAGCCAGTGCCGACGQCCGACCACCGATGCTGCACCAAGACGACC 

TCCTCGGTGCGTTGACCGTCATCCGGCCGCTGTCCCGCTCGGCCAGCGACGAAGTCACCGTGG 

GTGACGTGACGCTCGACGATGTCGGTGACATGGCCGCGGCCAAACAAGCACTGACCGAGGCG 
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GTGCTGTGGCCGCTGCAGCACCCCGACACCTTCGCTCGGCTAGGTGTCGAACCGCCGCGCGG 

GGTGTTGCTGTACGGCCCGCCCGGCTGCGGCAAGACCTTTGTGGTTCGTGCCCTGGCCAGCAC 

CGGACAGTTGAGCGTGCATGCCGTCAAAGGGTCGGAGCTGATGGACAAGTGGGTGGGCTCCTC 

GGAGAAGGCAGTCCGCGAGCTATTCCGGCGGGCCCGCGACTCCGCGCCGTCACTGGTGTTCC 

TCGACGAGCTGGACGCTCTGGCGCCACGGCGCGGTCAGAGCTTCGACTCGGGCGTCTCCGAC 

CGGGTGGTGGCCGCGCTGCTGACTGAGCTCGACGGTATTGACCCGCTGCGGGATGTCGTCATG 

CTAGGCGCGACCAACCGGCCCGATCTGATAGACCCGGCGCTGCTGCGCCCGGGGCGGCTAGA 

ACGGCTGGTGTTCGTTGAACCGCCCGACGCTGCCGCTCGCCGCGAAATCCTGCGCACCGCTGG 

CAAGTCGATCCCGCTGAGCTCCGACGTCGACCTGGACGAGGTGGCAGCCGGACTCGACGGTTA 

TAGTGCCGCCGACTGTGTGGCGCTGCTGCGCGAAGCCGCGCTTACCGCGATGCGGCGTTCCAT 

CGATGCCGCCAACGTCACCGCCGCCGACCTGGCGACCGCGCGAGAAACC<3TGCGCGCGTCGC 

TGGATCCGCTGCAGGTGGCGTCGCTGCGTAAGTTCGGCACCAAGGGTGACCTTCGGTCCTAG 

>Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase TB.seq 524531 :525388 

MW:31219 >emb|AL123456|MTBH37RV:c525388-524528. pssA SEQ ID NO:22 

ATGATCGGAAAGCCCCGCGGCAGGCGAGGGGTAAACCTGCAGATACTGCCCAGCGCGATGAC 

GGTGCTGTCCATTTGCGCGGGACTGACCGCAATCAAGTTTGCGCTCGAGCACCAGCCGAAGGC 

CGCGATGGCACTGATCGCCGCAGCGGCCATCCTCGACGGGCTCGACGGCCGGGTGGCCCGCA 

TCCTGGATGCCCAGTCGCGGATGGGCGCAGAGATCGACTCACTGGCCGACGCGGTGAACTTCG 

GAGTGACACCCGCGGTGGTGCTTTACGTGTCGATGTTGTCGAAGTGGCCGGTCGGTTGGGTGG 

TCGTGCTGCTCTACGCGGTGTGCGTGGTATTACGGCTGGCGCGGTACAACGCACTGCAGGACG 

ACGGAACCCAGCCCGCCTACGCGCATGAATTCTTCGTCGGAATGCCCGCGCCGGCGGGCGCG 

GTTTCCATGATCGGCCTGCTAGCCCTCAAAATGCAGTTCGGCGAAGGATGGTGGACCTCGGGCT 

GGTTCCTCAGCTTTTGGGTGACGGGAACGTCGATACTCTTGGTCAGCGGGATCCCGATGAAAAA 

GATGCACGCCGTGTCGGTACCACCCAACTACGCGGCCGCCCTGCTGGCGGTGCTGGCTATCTG 

CGCGGCGGCCGCAGTCCTGGCCCCCTACTTGTTGATCTGGGTGATCATCATCGCCTACATGTGC 

CATATTCCTTTCGCGGTGCGCAGCCAGCGCTGGCTTGCCCAACACCCTGAGGTGTGGGACGAC 

AAGCCCAAGCAACGGCGCGCGGTGCGGCGCGCGAGCCGCCGGGCGCATCCCTACCGGCCGT 

CGATGGCGCGGCTGGGCCTGCGCAAGCCGGGTCGACGGCTGTGA 

>Rv0440 groEL 260 kD chaperonin 2 TB.seq 528606:530225 MW:56728 
>emb|AL123456|MTBH37RV:528606-530228. groEL2 SEQ ID NO:23 

ATGGCCAAGAGAATTGCGTACGACGAAGAGGCCCGTCGCGGCCTCGAGCGGGGCTTGAACGC 

CCTCGCCGATGCGGTAAAGGTGACATTGGGCCCCAAGGGCCGCAACGTCGTCCTGGAAAAGAA 

GTGGGGTGCCCCCACGATCACCAACGATGGTGTGTCCATCGCCAAGGAGATCGAGCTGGAGGA 

TCCGTACGAGAAGATCGGCGCCGAGCTGGTCAAAGAGGTAGCCAAGAAGACCGATGACGTCGC 

CGGTGACGGCACCACGACGGCCACCGTGCTGGCCCAGGCGTTGGTTCGCGAGGGCCTGCGCA 
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ACGTCGCGGCCGGCGCCAACCCGCTCGGTCTCAAACGCGGCATCGAAAAGGCCGTGGAGAAG 

GTCACCGAGACCCTGCTCAAGGGCGCCAAGGAGGTCGAGACCAAGGAGCAGATTGCGGCCAC 

CGCAGCGATTTCGGCGGGTGACCAGTCCATCGGTGACCTGATCGCCGAGGCGATGGACAAGGT 

GGGCAACGAGGGCGTCATCACCGTCGAGGAGTCCAACACCTTTGGGCTGCAGCTCGAGCTCAC 

CGAGGGTATGCGGTTCGACAAGGGCTACATCTCGGGGTACTTCGTGACCGACCCGGAGCGTCA 

GGAGGCGGTCCTGGAGGACCCCTACATCCTGCTGGTCAGCTCCAAGGTGTCCACTGTCAAGGA 

TCTGCTGCCGCTGCTCGAGAAGGTCATCGGAGCCGGTAAGCCGCTGCTGATCATCGCCGAGGA 

CGTCGAGGGCGAGGCGCTGTCCACCCTGGTCGTCAACAAGATCCGCGGCACCTTCAAGTCGGT 

GGCGGTCAAGGCTCCCGGCTTCGGCGACCGCCGCAAGGCGATGCTGCAGGATATGGCCATTCT 

CACCGGTGGTCAGGTGATCAGCGAAGAGGTCGGCCTGACGCTGGAGAACGCCGACCTGTCGC 

TGCTAGGCAAGGCCCGCAAGGTCGTGGTCACCAAGGACGAGACCACCATCGTCGAGGGCGCC 

GGTGACACCGACGCCATCGCCGGACGAGTGGCCCAGATCCGCCAGGAGATCGAGAAGAGCGA 

CTCCGACTACGACCGTGAGAAGCTGCAGGAGCGGCTGGCCAAGCTGGCCGGTGGTGTCGCGG 

TGATCAAGGCCGGTGCCGCCACCGAGGTCGAACTCAAGGAGCGCAAGCACCGCATCGAGGAT 

GCGGTTCGCAATGCCAAGGCCGCCGTCGAGGAGGGCATCGTCGCCGGTGGGGGTGTGACGCT 

GTTGCAAGCGGCCCCGACCCTGGACGAGCTGAAGCTCGAAGGCGACGAGGCGACCGGCGCCA 

ACATCGTGAAGGTGGCGCTGGAGGCCCCGCTGAAGCAGATCGCCTTCAACTCCGGGCTGGAGC 

CGGGCGTGGTGGCCGAGAAGGTGCGCAACCTGCCGGCTGGCCACGGACTGAACGCTCAGACC 

GGTGTCTACGAGGATCTGCTCGCTGCCGGCGTTGCTGACCCGGTCAAGGTGACCCGTTCGGCG 

CTGCAGAATGCGGCGTCCATCGCGGGGCTGTTCCTGACCACCGAGGCCGTCGTTGCCGACAAG 

CCGGAAAAGGAGAAGGCTTCCGTTCCCGGTGGCGGCGACATGGGTGGCATGGATTTCTGA 

>Rv0482 mure TB.seq 570537:571643 MW:38522 
>emb|AL123456|MTBH37RV:570537-571646. murB SEQ ID NO:24 

ATGAAACGGAGCGGTGTCGGTTCGCTCTTTGCCGGTGCGCATATTGCCGAGGCGGTCCCGTTG 

GCGCCGCTGACCACTTTGCGTGTGGGCCCGATCGCCCGACGTGTCATCACTTGCACCAGCGCC 

GAACAGGTGGTGGCTGCGCTGCGGCACCTGGATTCGGCGGCCAAGACCGGAGCTGACCGCCC 

GCTGGTGTTTGCTGGTGGCTCCAATTTGGTGATCGCCGAGAACCTGACCGACCTGACCGTGGT 

GCGGTTGGCCAATAGCGGCATCACCATCGACGGTAACTTGGTGCGGGCCGAGGCCGGTGCGG 

TCTTCGATGACGTGGTGGTTAGGGCCATCGAACAGGGTCTGGGCGGACTGGAATGCCTGTCTG 

GCATCCCAGGATCGGCCGGGGCGACACCCGTGCAGAACGTGGGGGCGTATGGCGCGGAGGT 

GTCTGACACCATCACTCGGGTTCGGCTTTTGGATCGGTGCACGGGTGAGGTGCGTTGGGTATC 

CGCGCGCGACCTGCGCTTCGGCTATCGCACGAGCGTGCTCAAACACGCTGATGGGCTTGCGGT 

GCCCACCGTGGTCTTGGAGGTGGAGTTTGCGCTGGATCCGTCGGGCCGCAGCGCACCGCTGC 

GCTACGGCGAGCTGATCGCCGCGCTGAATGCGACCAGCGGCGAGCGCGCCGACCCGCAAGCG 

GTCCGCGAAGCGGTGCTGGCCCTGCGGGCACGCAAGGGCATGGTGCTGGACCCGACCGACCA 

TGACACCTGGAGCGTGGGATCGTTCTTCACAAACCCGGTGGTCACCCAGGATGTTTACGAACGG 
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CTGGCCGGTGACGCGGCCACCAGAAAGGACGGTCCGGTCCCGCACTATCCCGCGCCCGACGG 

CGTCAAGCTGGCCGCCGGCTGGCTGGTGGAACGGGCCGGCTTCGGCAAGGGCTATCCGGATG 

CCGGCGCCGCCCCATGGCGGCTTTCCACCAAACATGCGCTGGCGCTGACAAATCGTGGCGGG 

GCCACCGCCGAAGATGTGGTGACGCTGGCGCGCGCCGTGCGCGATGGGGTCCATGATGTGTTT 

GGTATCACACTAAAACCCGAACCCGTGCTGATCGGCTGCATGTTGTAG 

>Rv0483 - TB.seq 571708:573060 MW:47859 

>emb|AL123456|MTBH37RV:571 708-573063. Rv0483 SEQ ID NO:25 

GTGGTCATTCGTGTGCTGTTTCGCCCGGTATCTTTGATACCCGTGAATAACTCCAGCACCCCCCA 

GAGTCAGGGGCCGATCAGTCGGCGTCTGGCGTTGACGGCCCTTGGGTTTGGGGTGTTGGCACC 

GAACGTTCTGGTCGCGTGCGCCGGCAAAGTGACCAAGCTGGCCGAGAAGAGGCCGCCACCGG 

CGCCTCGTCTGACTTTCCGGCCTGCCGACTCTGCCGCCGACGTGGTGCCGATCGCGCCGATCA 

GCGTCGAGGTCGGTGACGGCTGGTTTCAGCGGGTCGCGCTGACCAATTCGGCAGGCAAGGTC 

GTCGCCGGGGCATACAGCCGGGATCGCACCATCTACACGATCACCGAGCCGCTGGGCTACGAC 

ACGACCTACACCTGGAGCGGTTCGGCCGTCGGCCATGACGGCAAGGCGGTTCCGGTGGCGGG 

CAAGTTCACCACCGTGGCACCCGTCAAGACGATCAACGCGGGATTCCAGCTCGCCGACGGCCA 

GACCGTCGGGATCGCGGCGCCGGTGATTATTCAGTTCGATTCACCGATCAGCGACAAGGCCGC 

CGTCGAGCGGGCACTAACCGTGACCACCGACCCGCCTGTCGAGGGCGGCTGGGCCTGGCTGC 

CCGACGAGGCGCAGGGCGCTCGCGTGCACTGGCGTCCTCGGGAGTACTACCCGGCGGGTACC 

ACCGTCGACGTCGACGCCAAGCTGTATGGGCTGCCGTTCGGCGACGGCGCGTACGGCGCGCA 

GGATATGTCGTTGCACTTCCAGATCGGTCGTCGTCAGGTGGTCAAGGCCGAAGTCTCGTCGCAC 

CGCATCCAAGTCGTCACCGATGCCGGCGTCATCATGGACTTCCCGTGCAGCTACGGCGAGGCC 

GACTTGGCGCGCAACGTCACCCGCAACGGCATCCACGTCGTCACCGAGAAATACTCGGACTTC 

TACATGTCCAACCCGGCCGCCGGTTACAGCCATATCCACGAACGTTGGGCGGTGCGGATTTCC 

AACAACGGCGAGTTCATCCATGCCAACCCTATGAGCGCCGGTGCCCAGGGCAACAGCAATGTC 

ACCAACGGCTGTATCAACCTGTCGACGGAGAACGCCGAACAGTACTACCGCAGCGCGGTCTAC 

GGTGACCCGGTTGAGGTGACCGGCAGTTCGATCCAGCTGTCCTACGCCGACGGTGACATCTGG 

GACTGGGCGGTGGACTGGGACACCTGGGTGTCGATGTCGGCGCTACCGCCACCGGCGGCCAA 

ACCGGCGGCGACGCAAATCCCGGTCACCGCCCCGGTCACGCCGTCGGATGCGCCCACCCCGT 

CCGGCACACCCACGACTACTAACGGACCGGGTGGGTAG 

>Rv0489 gpm phosphoglycerate mutase I TB.seq 578424:579170 MW:27217 
>emb|AL123456|MTBH37RV:578424-579173. gpm SEQ ID NO:26 

ATGGCAAACACTGGCAGCCTGGTGTTGCTGCGCCACGGCGAGAGCGACTGGAATGCCCTCAAC 
CTGTTCACCGGCTGGGTCGATGTCGGCCTGACGGACAAGGGCCAGGCAGAGGCGGTTCGAAG 
CGGCGAGCTGATCGCGGAACACGACCTATTGCCCGACGTGCTCTACACCTCGTTGCTGCGGCG 
CGCGATCACCACCGCGCATCTGGCGTTGGACAGCGCCGATCGGCTCTGGATTCCCGTGCGGCG 
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TAGCTGGCGGCTCAACGAACGCCACTACGGCGCGCTGCAGGGTTTGGACAAGGCCGAGACCAA 

GGCCCGCTATGGCGAAGAGCAGTTCATGGCCTGGCGGCGCAGCTATGACACGCCGCCGCCGC 

CGATCGAGCGGGGCAGTCAGTTCAGCCAGGACGCCGACCCTCGTTACGCCGACATCGGCGGT 

GGCCCGCTCACCGAATGTCTGGCTGACGTGGTCGCCCGGTTTTTGCCATATTTCACCGACGTCA 

TCGTTGGCGACTTGCGGGTCGGCAAGACGGTGCTGATCGTTGCCCACGGCAACTCGTTGCGCG 

CGCTGGTCAAGCACCTGGACCAGATGTCTGACGACGAAATCGTCGGACTGAACATCCCGACCG 

GAATTCCGCTGCGCTACGACCTGGATTCCGCGATGAGGCCGCTGGTGCGCGGTGGTAGGTATC 

TGGACCCGGAGGCGGCAGCCGCCGGCGCCGCCGCGGTGGCCGGCCAGGGCCGCGGGTAA 

>Rv0490 senX 3sensor histidine kinase TB.seq 579347:580576 MW:44794 
>emblAL123456|MTBH37RV:579347-580579. senX3 SEQ \D NO:27 

GTGACTGTGTTCTCGGCGCTGTTGCTGGCCGGGGTTTTGTCGGCGCTGGCACTGGCCGTCGGT 

GGTGCTGTTGGAATGCGGCTGACGTCGCGGGTCGTCGAACAGCGCCAACGGGTGGCCACGGA 

GTGGTCGGGAATCACGGTTTCGCAGATGTTGCAATGCATTGTCACGCTGATGCCGCTGGGCGC 

CGCGGTGGTGGACACCCATCGCGACGTTGTCTACCTCAACGAACGGGCCAAAGAGCTAGGTCT 

GGTGCGCGACCGCCAGCTCGATGATCAGGCCTGGCGGGCCGCCCGGCAGGCGCTGGGTGGT 

GAAGACGTCGAGTTCGACCTGTCGCCGCGCAAGCGGTCGGCCACGGGTCGATCCGGGCTATC 

AGTGCATGGGCATGCCCGGTTGCTGAGCGAGGAAGACCGCCGGTTCGCCGTGGTGTTCGTGCA 

CGACCAGTCGGATTATGCGCGGATGGAGGCGGCTAGGCGTGACTTCGTGGCCAACGTCAGTCA 

CGAGCTCAAGACGCCCGTCGGTGCCATGGCTCTACTCGCCGAGGCGCTGCTGGCGTCGGGCG 

ACGACTCCGAAACCGTTCGGCGG-rrCGCCGAGAAGGTGCTCATTGAGGCCAACCGGCTCGGTG 

ACATGGTCGCCGAGTTGATCGAGCTATCCCGGCTACAGGGCGCCGAGCGGCTACCCAATATGA 

CCGACGTCGACGTCGATACGATTGTGTCGGAAGCGATTTCACGCCATAAGGTGGCGGCCGACA 

ACGCCGACATCGAAGTCCGCACCGACGCGCCGAGCAATCTGCGGGTGCTGGGCGACCAAACTG 

TGCTGGTTACCGCACTGGCAAACCTGGTTTCCAATGCGATTGCCTATTCGCCGCGCGGGTCGCT 

GGTGTCGATCAGCCGTCGCCGTCGCGGTGCCAACATCGAGATCGCCGTCACCGACCGGGGCA 

TCGGCATCGCGCCGGAAGACCAGGAGCGGGTCTTCGAACGGTTCTTCCGGGGGGACAAGGCG 

CGCTCGCGTGCCACCGGAGGCAGCGGACTCGGGTTGGCCATCGTCAAACACGTCGCGGCTAAT 

CACGACGGCACCATCCGCGTGTGGAGCAAACCGGGAACCGGGTCAACGTTCACCTTGGCTCTT 

CCGGCGTTGATCGAGGCCTATCACGACGACGAGCGACCCGAGCAGGCGCGAGAGCCCGAACT 

GCGGTCAAACAGGTCACAACGAGAGGAAGAGCTGAGCCGATGA 

>Rv0500 proC pyrroline-S^rboxylate reductase TB.seq 590081 :590965 MW:301 72 
>emb|AL123456|MTBH37RV:590081 -590968. proC SEQ ID NO:28 

ATGCTTTTCGGCATGGCAAGGATCGCGATTATCGGCGGCGGCAGCATCGGTGAGGCATTGCTG 

TCGGGTCTGCTGCGGGCGGGCCGGCAGGiTCAAAGACCTGGTAGTGGCCGAGCGGATGCCCGA 

TCGCGCCAACTACCTGGCGCAGACCTATTCGGTGTTGGTGACGTCGGCGGCCGACGCGGTGGA 

66 



01 3531 7 A1 I > 



wo 01/35317 



PCT/US00/311S2 



GAACGCGACGTTCGTCGTCGTCGCGGTCAAACCAGCCGACGTCGAGCCGGTGATCGCGGATCT 

GGCGAACGCGACTGCGGCGGCCGAAAACGACAGTGCTGAGCAGGTGTTCGTCACCGTGGTAG 

CGGGCATCACGATCGCGTATTTCGAATCCAAGCTACCGGCTGGGACGCCAGTGGTGCGTGCGA 

TGCCGAACGCGGCGGCATTGGTGGGAGCGGGGGTTACAGCGCTGGCCAAAGGCCGCTTTGTC 

ACCCCGCAACAGCTTGAGGAGGTCTCGGCCTTGTTCGACGCGGTCGGCGGCGTGCTGACCGTT 

CCGGAATCGCAGTTGGACGCGGTGACCGCGGTGTCCGGCTCGGGTCCGGCCTATTTCTTTCTG 

CTGGTCGAGGCCCTGGTGGATGCCGGAGTCGGGGTGGGCTTGAGCCGTCAGGTGGCCACCGA 

TCTCGCCGCGCAGACAATGGCTGGCTCAGCGGCGATGCTGCTGGAGCGGATGGAGCAAGACC 

AGGGTGGCGCCAATGGCGAGCTGATGGGGCTGCGCGTGGACCTTACCGCATCACGGCTGCGC 

GCCGCGGTTACCTCGCCGGGCGGTACGACCGCCGCTGCGCTGCGGGAACTCGAACGCGGCG 

GGTTTCGGATGGCTGTCGACGCGGCGGTTCAAGCCGCCAAAAGCCGCTCTGAGCAGCTCAGAA 

TTACACCGGAATGA 

>Rv0528 - TB.seq 618303:619889 MW:57132 

>emb|AL123456|MTBH37RV:61 8303-61 9892, Rv0528 SEQ ID NO:29 

ATGTGGCGGTCGTTGACGTCGATGGGCACCGCGCTGGTGCTGCTGTTTTTGCTCGCGCTGGCT 

GCCATACCCGGGGCCCTGCTGCCGCAGCGTGGCCTCAACGCCGCCAAGGTGGACGACTACGT 

GGCCGCGCACCCACTCATCGGTCCGTGGCTGGACGAGCTGCAGGCCTTCGACGTGTTCTCCAG 

CTTCTGGTTCACCGCCATCTACGTGCTGCTGTTCGTGTCCCTCGTCGGCTGTCTGGCCCCGCGG 

ACGATCGAGCACGCCCGCAGCCTGCGGGCTACACCGGTCGCCGCCCCGCGCAACCTGGCCCG 

GCTGCCCAAGCACGCCCACGCCCGGCTGGCCGGCGAGCCCGCCGCCCTGGCCGCCACCATCA 

CGGGCCGGCTGCGCGGCTGGCGCAGCATCACCCGGCAACAAGGCGACAGCGTGGAAGTCTCC 

GCCGAGAAGGGCTACCTGCGCGAGTTCGGCAACCTGGTGTTCCACTTCGCGCTGCTGGGTCTG 

CTGGTGGCGGTGGCCGTCGGCAAGCTGTTCGGCTACGAGGGCAACGTGATCGTGATAGCCGA 

CGGCGGACCCGGTTTTTGTTCGGCGTCGCCGGCCGCGTTCGACTCGTTTCGCGCCGGCAACAC 

CGTCGACGGCACGTCGTTGCACCCGATCTGTGTGCGGGTCAACAACTTCCAAGCGCACTACCT 

GCCGTCCGGGCAGGCCACCTCGTTCGCCGCCGACATCGACTATCAGGCCGACCCGGCCACTG 

CTGACCTGATCGCCAACAGCTGGCGGCCCTACCGGCTGCAGGTCAATCACCCGCTGCGGGTCG 

GCGGCGACCGGGTGTACCTGCAGGGCCACGGCTATGCGCCCACCTTCACCGTGACGTTCCCG 

GACGGGCAGACCCGCACGTCGACCGTGCAGTGGCGACCCGACAACCCGCAGACCCTGCTGTC 

GGCGGGCGTCGTGCGCATCGACCCGCCGGCCGGCAGCTACCCCAACCCCGACGAGCGTCGCA 

AACACCAGATCGCCATCCAGGGCCTGCTGGCTCCCACCGAGCAGCTCGACGGCACCCTGCTGT 

CGTCGCGTTTCCCCGCGCTCAATGCCCCGGCGGTGGCCATCGACATCTACCGCGGCGACACCG 

GCCTGGACAGCGGGCGGCCCCAGTCGTTGTTCACCCTGGACCACCGGCTGATCGAGCAGGGC 

CGGCTGGTCAAGGAAAAGCGGGTCAACCTGCGCGCCGGTCAGCAAGTCCGCATCGACCAAGG 

CCCGGCGGCCGGCACGGTGGTCCGGTTCGACGGCGCGGTGCCGTTCGTCAACCTGCAGGTCT 

CCCACGACCCCGGCCAGTCCTGGGTGCTGGTCTTCGCAATCACGATGATGGCGGGACTGCTGG 

TGTCGCTGCTGGTGCGCAGGCGCCGGGTGTGGGCGCGGATCACGCCGACGACCGCGGGTACG 
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GTAAACGTCGAGCTGGGCGGCCTGACGCGCACCGACAACTCCGGGTGGGGCGCCGAGTTCGA 

GCGGCTGACCGGGCGGTTGCTGGCGGGTTTTGAGGCGCGGTCCCCGGACATGGCCGAAGCGG 

CCGCAGGGACCGGAAGGGACGTCGATTGA 

>Rv0667 rpoB [beta] subunit of RNA polymerase TB.seq 759805:763320 MW:129220 
>emb|AL123456|MTBH37RV:759805-763323. rpoB SEQ ID NO:30 

TTGGCAGATTCCCGCCAGAGCAAAACAGCCGCTAGTCCTAGTCCGAGTCGCCCGCAAAGTTCCT 

CGAATAACTCCGTACCCGGAGCGCCAAACCGGGTCTCCTTCGCTAAGCTGCGCGAACCACTTG 

AGGTTCCGGGACTCCTTGACGTCCAGACCGATTCGTTCGAGTGGCTGATCGGTTCGGCGCGCT 

GGCGCGAATCCGCCGCCGAGCGGGGTGATGTCAACCCAGTGGGTGGCCTGGAAGAGGTGCTC 

TACGAGCTGTCTCCGATCGAGGACTTCTCCGGGTCGATGTCGTTGTCGTTCTCTGACCCTCGTT 

TCGACGATGTGAAGGCACCCGTCGACGAGTGCAAAGACAAGGACATGACGTACGCGGCTCCAC 

TGTTCGTCACGGCCGAGTTCATCAACAACAACACCGGTGAGATCAAGAGTCAGACGGTGTTCAT 

GGGTGACTTCCCGATGATGACCGAGAAGGGCACGTTCATCATCAACGGGACCGAGCGTGTGGT 

GGTCAGCCAGCTGGTGCGGTCGCCCGGGGTGTACTTCGACGAGACCATTGACAAGTCCACCGA 

CAAGACGCTGCACAGCGTCAAGGTGATCCCGAGCCGCGGCGCGTGGCTCGAGTTTGACGTCGA 

CAAGCGCGACACCGTCGGCGTGCGCATCGACCGCAAACGCCGGCAACCGGTCACCGTGCTGC 

TCAAGGCGCTGGGCTGGACCAGCGAGCAGATTGTCGAGCGGTTCGGGTTCTCGGAGATCATGC 

GATCGACGCTGGAGAAGGACAACACCGTCGGCACCGACGAGGCGCTGTTGGACATCTACCGCA 

AGCTGCGTCCGGGCGAGCCCCCGACCAAAGAGTCAGCGCAGACGCTGTTGGAAAACTTGTTCT 

TCAAGGAGAAGCGCTACGACCTGGCCCGCGTCGGTCGCTATAAGGTCAACAAGAAGCTCGGGC 

TGCATGTCGGCGAGCCCATCACGTCGTCGACGCTGACCGAAGAAGACGTCGTGGCCACCATCG 

AATATCTGGTCCGCTTGCACGAGGGTCAGACCACGATGACCGTTCCGGGCGGCGTCGAGGTGC 

CGGTGGAAACCGACGACATCGACCACTTCGGCAACCGCCGCCTGCGTACGGTCGGCGAGCTG 

ATCCAAAACCAGATCCGGGTCGGCATGTCGCGGATGGAGCGGGTGGTCCGGGAGCGGATGAC 

CACCCAGGACGTGGAGGCGATCACACCGCAGACGTTGATCAACATCCGGCCGGTGGTCGCCG 

CGATCAAGGAGTTCTTCGGCACCAGCCAGCTGAGCCAATTCATGGACCAGAAGAACCCGCTGTC 

GGGGTTGACCCACAAGCGCCGACTGTCGGCGCTGGGGCCCGGCGGTCTGTCACGTGAGCGTG 

CCGGGCTGGAGGTCCGCGACGTGCACCCGTCGCACTACGGCCGGATGTGCCCGATCGAAACC 

CCTGAGGGGCCCAACATCGGTCTGATCGGCTCGCTGTCGGTGTACGCGCGGGTCAACCCGTTC 

GGGTTCATCGAAACGCCGTACCGCAAGGTGGTCGACGGCGTGGTTAGCGACGAGATCGTGTAC 

CTGACCGCCGACGAGGAGGACCGCCACGTGGTGGCACAGGCCAATTCGCCGATCGATGCGGA 

CGGTCGCTTCGTCGAGCCGCGCGTGCTGGTCCGCCGCAAGGCGGGCGAGGTGGAGTACGTGC 

CCTCGTCTGAGGTGGACTACATGGACGTCTCGCCCCGCCAGATGGTGTCGGTGGCCACCGCGA 

TGATTCCCTTCCTGGAGCACGACGACGCCAACCGTGCCCTCATGGGGGCAAACATGCAGCGCC 

AGGCGGTGCCGCTGGTCCGTAGCGAGGCCCCGCTGGTGGGCACCGGGATGGAGCTGCGCGC 

GGCGATCGACGCCGGCGACGTCGTCGTCGCCGAAGAAAGCGGCGTCATCGAGGAGGTGTCGG 
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CCGACTACATCACTGTGATGCACGACAACGGCACCCGGCGTACCTACCGGATGCGCAAGTTTG 

CCCGGTCCAACCACGGCACTTGCGCCAACCAGTGCCCCATCGTGGACGCGGGCGACCGAGTC 

GAGGCCGGTCAGGTGATCGCCGACGGTCCCTGTACTGACGACGGCGAGATGGCGCTGGGCAA 

GAACCTGCTGGTGGCCATCATGCCGTGGGAGGGCCACAACTACGAGGACGCGATCATCCTGTC 

CAACCGCCTGGTCGAAGAGGACGTGCTCACCTCGATCCACATCGAGGAGCATGAGATCGATGC 

TCGCGACACCAAGCTGGGTGCGGAGGAGATCACCCGCGACATCCCGAACATCTCCGACGAGGT 

GCTCGCCGACCTGGATGAGCGGGGCATCGTGCGCATCGGTGCCGAGGTTCGCGACGGGGACA 

TCCTGGTCGGCAAGGTCACCCCGAAGGGTGAGACCGAGCTGACGCCGGAGGAGCGGCTGCTG 

CGTGCCATCTTCGGTGAGAAGGCCCGCGAGGTGCGCGACACTTCGCTGAAGGTGCCGCACGG 

CGAATCCGGCAAGGTGATCGGCATTCGGGTGTTTTCCCGCGAGGACGAGGACGAGTTGCCGGC 

CGGTGTGAACGAGCTGGTGCGTGTGTATGTGGCTCAGAAACGCAAGATCTCCGACGGTGACAA 

GCTGGCCGGCCGGCACGGCAACAAGGGCGTGATCGGCAAGATCCTGCCGGTtGAGGACATGC 

CGTTCCTTGCCGACGGCACCCCGGTGGACATTATTTTGAACACCCACGGCGTGCCGCGACGGA 

TGAACATCGGCCAGATTTTGGAGACCCACCTGGGTTGGTGTGCCCACAGCGGCTGGAAGGTCG 

ACGCCGCCAAGGGGGTTCCGGACTGGGCCGCCAGGCTGCCCGACGAACTGCTCGAGGCGGAG 

CCGAACGCCATTGTGTCGACGCCGGTGTTCGACGGCGCCCAGGAGGCCGAGCTGCAGGGCCT 

GTTGTCGTGCACGCTGCCCAACCGCGACGGTGACGTGCTGGTCGACGCCGACGGCAAGGCCA 

TGCTCTTCGACGGGCGCAGCGGCGAGCCGTTCCCGTACCCGGTCACGGTTGGCTACATGTACA 

TCATGAAGCTGCACCACCTGGTGGACGACAAGATCCACGCCCGCTCCACCGGGCCGTACTCGA 

TGATCACCCAGCAGCCGCTGGGCGGTAAGGCGCAGTTCGGTGGCCAGCGGTTCGGGGAGATG 

GAGTGCTGGGCCATGCAGGCCTACGGTGCTGCCTACACCCTGCAGGAGCTGTTGACCATCAAG 

TCCGATGACACCGTCGGCCGCGTCAAGGTGTACGAGGCGATCGTCAAGGGTGAGAACATCCCG 

GAGCCGGGCATCCCCGAGTCGTTCAAGGTGCTGCTCAAAGAACTGCAGTCGCTGTGCCTCAAC 

GTCGAGGTGCTATCGAGTGACGGTGCGGCGATCGAACTGCGCGAAGGTGAGGACGAGGACCT 

GGAGCGGGCCGCGGCCAACCTGGGAATCAATCTGTCCCGCAACGAATCCGCAAGTGTCGAGGA 

TCTTGCGTAA 



>Rv0668 rpoC [beta]' subunit of RNA polymerase TB.seq 763368:767315 MW:146740 
>emb|AL123456|MTBH37RV:763368-767318. rpoC SEQ ID NO:31 

GTGCTCGACGTCAACTTCTTCGATGAACTCCGCATCGGTCTTGCTACCGCGGAGGACATCAGGC 

AATGGTCCTATGGCGAGGTCAAAAAGCCGGAGACGATCAACTACCGCACGCTTAAGCCGGAGA 

AGGACGGCCTGTTCTGCGAGAAGATCTTCGGGCCGACTCGCGACTGGGAATGCTACTGCGGCA 

AGTACAAGCGGGTGCGCTTCAAGGGCATCATCTGCGAGCGCTGCGGCGTCGAGGTGACCCGC 

GCCAAGGTGCGTCGTGAGCGGATGGGCCACATCGAGCTTGCCGCGCCCGTCACCCACATCTG 

GTACTTCAAGGGTGTGCCCTCGCGGCTGGGGTATCTGCTGGACCTGGCCCCGAAGGACCTGGA 

GAAGATCATCTACTTCGCTGCCTACGTGATCACCTCGGTCGACGAGGAGATGCGCCACAATGAG 

CTCTCCACGCTCGAGGCCGAAATGGCGGTGGAGCGCAAGGCCGTCGAAGACCAGCGCGACGG 
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CGAACTAGAGGCCCGGGCGCAAAAGCTGGAGGCCGACCTGGCCGAGCTGGAGGCCGAGGGC 

GCCAAGGCCGATGCGCGGCGCAAGGTTCGCGACGGCGGCGAGCGCGAGATGCGCCAGATCC 

GTGACCGCGCGCAGCGTGAGCTGGACCGGTTGGAGGACATCTGGAGCACTTTCACCAAGCTGG 

CGCCCAAGCAGCTGATCGTCGACGAAAACCTCTACCGCGAACTCGTCGACCGCTACGGCGAGT 

ACTTCACCGGTGCCATGGGCGCGGAGTCGATCCAGAAGCTGATCGAGAACTTCGACATCGACG 

CCGAAGCCGAGTCGCTGCGGGATGTCATCCGAAACGGCAAGGGGCAGAAGAAGCTTCGCGCC 

CTCAAGCGGCTGAAGGTGGTTGCGGCGTTCCAACAGTCGGGCAACTCGCCGATGGGCATGGTG 

CTCGACGCCGTCCCGGTGATCCCGCCGGAGCTGCGCCCGATGGTGCAGCTCGACGGCGGCCG 

GTTCGCCACGTCCGACTTGAACGACCTGTACCGCAGGGTGATCAACCGCAACAACCGGCTGAA 

AAGGCTGATCGATCTGGGTGCGCCGGAAATCATCGTCAACAACGAGAAGCGGATGCTGCAGGA 

ATCCGTGGACGCGCTGTTCGACAATGGCCGCCGCGGCCGGCCCGTCACCGGGCCGGGCAACC 

GTCCGCTCAAGTCGCTTTCCGATCTGCTCAAGGGCAAGCAGGGCCGGTTCCGGCAGAACCTGC 

TCGGCAAGCGTGTCGACTACTCGGGCCGGTCGGTCATCGTGGTCGGCCCGCAGCTCAAGCTGC 

ACCAGTGCGGTCTGCCCAAGCTGATGGCGCTGGAGCTGTTCAAGCCGTTCGTGATGAAGCGGC 

TGGTGGACCTCAACCATGCGCAGAACATCAAGAGCGCCAAGCGCATGGTGGAGCGCCAGCGCC 

CCCAAGTGTGGGATGTGCTCGAAGAGGTCATCGCCGAGCACCCGGTGTTGCTGAACCGCGCAC 

CCACCCTGCACCGGTTGGGTATCCAGGCCTTCGAGCCAATGCTGGTGGAAGGCAAGGCCATTC 

AGCTGCACCCGTTGGTGTGTGAGGCGTTCAATGCCGACTTCGACGGTGACCAGATGGCCGTGC 

ACCTGCCTTTGAGCGCCGAAGCGCAGGCCGAGGCTCGCATTTTGATGTTGTCCTCCAACAACAT 

CCTGTCGCCGGCATCTGGGCGTCCGTTGGCCATGCCGCGGCTGGACATGGTGACCGGGCTGT 

ACTACCTGACCACCGAGGTCCCCGGGGACACCGGCGAATACCAGCCGGCCAGCGGGGATCAC 

CCGGAGACTGGTGTCTACTCTTCGCCGGCCGAAGCGATCATGGCGGCCGACCGCGGTGTCTTG 

AGCGTGCGGGCCAAGATCAAGGTGCGGCTGACCCAGCTGCGGCCGCCGGTCGAGATCGAGGC 

CGAGCTATTCGGCCACAGCGGCTGGCAGCCGGGCGATGCGTGGATGGCCGAGACCACGCTGG 

GCCGGGTGATGTTCAACGAGCTGCTGCCGCTGGGTTATCCGTTCGTCAACAAGCAGATGCACAA 

GAAGGTGCAGGCCGCCATCATCAACGACCTGGCCGAGCGTTACCCGATGATCGTGGTCGCCCA 

GACCGTCGACAAGCTCAAGGACGCCGGCTTCTACTGGGCCACCCGCAGCGGCGTGACGGTGT 

CGATGGCCGACGTGCTGGTGCCGCCGCGCAAGAAGGAGATCCTCGACCACTACGAGGAGCGC 

GCGGACAAGGTCGAAAAGCAGTTCCAGCGTGGCGCTTTGAACCACGACGAGCGCAACGAGGC 

GCTGGTGGAGATTTGGAAGGAAGCCACCGACGAGGTCGGTCAGGCGTTGCGGGAGCACTACC 

CCGACGACAACCCGATCATCACCATCGTCGACTCCGGCGCCACCGGCAACTTCACCCAGACTC 

GAACGCTGGCCGGTATGAAGGGCCTGGTGACCAACCCGAAGGGTGAGTTCATCCCGCGTCCG 

GTCAAGTCCTCCTTCCGTGAGGGCCTGACCGTGCTGGAGTACTTCATCAACACCCACGGCGCTC 

GAAAGGGCTTGGCGGACACCGCGTTGCGCACCGCCGACTCCGGCTACCTGACCCGACGTCTG 

GTGGACGTGTCCCAGGACGTGATCGTGCGCGAGCACGACTGCCAGACCGAGCGCGGCATCGT 

CGTCGAGCTGGCCGAGCGTGCACCCGACGGCACGCTGATCCGCGACCCGTACATCGAAACCTC 

GGCCTACGCGCGGACCCTGGGCACCGACGCGGTCGACGAGGCCGGCAACGTCATCGTCGAGC 
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GTGGTCAAGACCTGGGCGATCCGGAGATTGACGCTCTGTTGGCTGCTGGTATTACCCAGGTCAA 
GGTGCGTTCGGTGCTGACGTGTGCCACCAGCACCGGCGTGTGCGCGACCTGCTACGGGCGTT 
CCATGGCCACCGGCAAGCTGGTCGACATCGGTGAAGCCGTCGGCATCGTGGCCGCCCAGTCC 
ATCGGCGAACCCGGCACCCAGCTGACCATGCGCACCTTCCACCAGGGTGGCGTCGGTGAGGA 
5 CATCACCGGTGGTCTGCCCCGGGTGCAGGAGCTGTTCGAGGCCCGGGTACCGCGTGGCAAGG 
CGCCGATCGCCGACGTCACCGGCCGGGTTCGGCTCGAGGACGGCGAGCGGTTCTACAAGATC 
ACCATCGTTCCTGACGACGGCGGTGAGGAAGTGGTCTACGACAAGATCTCCAAGCGGCAGCGG 
CTGCGGGTGTTCAAGCACGAAGACGGTTCCGAACGGGTGCTCTCCGATGGCGACCACGTCGAG 
GTGGGCCAGCAGCTGATGGAAGGCTCGGCCGACCCGCATGAGGTGCTGCGGGTGCAGGGCCC 

10 CCGCGAGGTGCAGATACACCTGGTTCGCGAGGTCCAGGAGGTCTACCGCGCCCAAGGTGTGTC 
GATCCACGACAAGCACATCGAGGTGATCGTTCGCCAGATGCTGCGCCGGGTGACCATCATCGA 
CTCGGGCTCGACGGAGTTTTTGCCTGGCTCGCTGATCGACCGCGCGGAGTTCGAGGCAGAGAA 
CCGCCGAGTGGTGGCCGAGGGCGGTGAGCCCGCGGCCGGCCGTCCGGTGCTGATGGGCATC 
ACGAAGGCGTCGCTGGCCACCGACTCGTGGCTGTCGGCGGCGTCGTTCCAGGAGACCACTCG 

15 CGTGCTGACCGATGCGGCGATCAACTGCCGCAGCGATAAGCTCAACGGTCTGAAGGAAAACGT 
GATCATCGGCAAGCTGATCCCGGCCGGTACCGGTATCAACCGCTACCGCAACATCGCGGTGCA 
GCCCACCGAGGAGGCCCGCGCTGCGGCGTAGACCATCCCGTCGTATGAGGATCAGTACTACAG 
CCCGGACTTCGGTGCGGCCACCGGTGCTGCCGTCCCGCTGGACGACTACGGCTACAGCGACTA 
CCGCTAG 

20 

>Rv071 1 atsA TB.seq 806333:808693 MW:86216 
>emb|AL123456|MTBH37RV:806333-808696, atsA SEQ ID NO:32 

ATGGCACCCGAGGCCACCGAGGCGTTCAACGGCACCATCGAGCTGGATATTCGTGATTCGGAG 
CCGGATTGGGGCCCATAGGCAGCGCCGGTGGCACCGGAGCACTCACCAAACATCCTGTATCTG 

25 GTCTGGGACGACGTCGGCATCGCGACCTGGGACTGCTTTGGCGGCCTGGTCGAGATGCCCGC 
GATGACGCGCGTCGCCGAGCGTGGCGTGCGACTGTCGCAATTTCACACCACCGCACTGTGCTC 
GCCGACCCGGGCGTCGCTGCTGACCGGTGGCAACGCCACCACCGTAGGCATGGCTACCATCG 
AAGAGTTCAGCGACGGGTTCCCCAACTGCAACGGGCGGATCCCGGCTGACACCGCGTTGCTCC 
CAGAGGTGCTGGCCGAACATGGCTACAACACCTACTGTGTGGGCAAGTGGCACCTGACGCCAC 

30 TCGAAGAATCCAATATGGCGTCGACGAAGCGGCACTGGCGGACCTCGCGTGGGTTCGAGCGGT 
TCTACGGATTCCTAGGCGGGGAGACCGACCAGTGGTATCCCGACCTGGTATACGACAACCACC 
CAGTGAGTCCTCCCGGCACACCCGAGGGTGGCTACCACCTGTCAAAAGACATCGCCGACAAGA 
CGATCGAGTTCATTCGTGATGCCAAGGTGATCGCGCCCGACAAGCCGTGGTTCAGCTACGTGTG 
CCCAGGCGCCGGGCATGCGCCGCACCACGTCTTCAAGGAATGGGCGGACAGATACGCCGGCC 

35 GATTCGACATGGGGTATGAGCGCTATCGCGAGATCGTGCTGGAAAGGCAAAAGGCGCTAGGGA 
TCGTGCCACCCGACACCGAACTGTCGCCCATAAACCCTTATCTGGATGTGCCGGGGCCAAACG 
GCGAGACCTGGCCGCTGCAGGACACGGTGCGGCCGTGGGACTCGCTGAGCGATGAAGAAAAG 
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AAGCTGTTTTGCCGGATGGCCGAGGTGTTCGCCGGCTTTCTGAGCTACACCGACGCCCAGATC 

GGACGGATCCTGGACTACCTCGAGGAATCCGGCCAGCTGGACAACACCATCATCGTGGTGATC 

TCCGACAACGGCGCCAGCGGCGAGGGCGGAGCCAACGGATCGGTCAACGAAGGCAAGTTCTT 

CAACGGCTACATCGACACCGTCGCTGAAAGCATGAAGCTCTTCGACCACCTCGGTGGCCCGCA 

GACCTACAACCACTACCCCATCGGGTGGGCAATGGCCTTCAACACCCCCTACAAGCTGTTCAAG 

CGCTACGCCTCGCATGAAGGCGGCATTGCCGACCCGGCAATCATCTCCTGGCCCAACGGCATT 

GCCGCACACGGTGAAATCCGCGACAACTACGTCAATGTCAGCGACATCACGCCCACCGTCTAC 

GACCTGTTGGGCATGACACCGCCGGGGACCGTCAAGGGGATTCXJGCAGAAACCGATGGACGG 

CGTGAGCTTCATAGCGGCCCTTGCCGACCCGGCCGCCGACACCGGCAAGACCACCCAGTTCTA 

CACCATGCTGGGCACCCGCGGGATCTGGCATGAAGGTTGGTTCGCCAACACCATTCACGCGGC 

CACGCCCGCCGGCTGGTCGAATTTCAACGCTGACCGCTGGGAACTGTTCCACATCGCAGCAGA 

CCGCAGCCAGTGCCACGACCTGGCCGCCGAGCATCCCGACAAACTTGAGGAGCTCAAGGCGCT 

GTGGTTCTCCGAAGCCGCCAAGTACAACGGGCTGCCGCTGGCCGATCTGAACCTCCTGGAAAC 

GATGACTCGGTCGCGGCCTTACCTGGTCAGCGAACGAGCCAGCTACGTCTACTATCCCGACTG 

CGCTGACGTCGGCATCGGCGCGGCCGTAGAGATTCGCGGGCGCTCGTTCGCCGTGCTGGCCG 

ATGTGACCATCGATACCACCGGCGCCGAGGGCGTGCTGTTCAAGCACGGCGGCGCCCATGGC 

GGGCACGTGCTGTTCGTCCGGGACGGACGCTTGCACTACGTCTACAACTTCCTCGGTGAGCGC 

CAGCAGCTGGTCAGCTCGTCGGGTCCGGTCCCGTCGGGAAGACATCTACTCGGGGTTCGTTAT 

TTGCGGACCGGAACCGTGCCCAACAGTCACACGCCGGTGGGCGATCTTGAGCTGTTCTTCGAC 

GAGAACCTGGTCGGCGCCCTGACCAATGTGCTGACCCACCCTGGAACGTTCGGGTTGGCCGGC 

GCCGCTATCAGCGTTGGCCGCAACGGCGGTTCGGCTGTGTCCAGCCACTACGAAGCGCCGTTC 

GCGTTCACCGGCGGTACCATCACCCAGGTCACCGTCGACGTGTCAGGCCGACCGTTCGAAGAT 

GTGGAATCCGATCTTGCGCTTGCTTTTTCGCGTGACTGA 

>Rv0764c - lanosterol 14-demethylase cytochrome PASO TB.seq 856683:858035 MW:50879 
>emb|AL123456|MTBH37RV:c858035-856680. Rv0764c SEQ ID NO:33 

ATGAGCGCTGTTGCACTACCCCGGGTTTCGGGTGGCCACGACGAACACGGCCACCTCGAGGAG 

TTCCGCACCGATCCGATCGGGCTGATGCAACGGGTCCGCGACGAATGCGGAGACGTCGGTACC 

TTCCAGCTGGCCGGGAAGCAGGTCGTGCTGCTGTCCGGCTCGCACGCCAACGAATTCTTCTTC 

CGGGCGGGCGACGACGACCTGGACCAGGCCAAGGCATACCCGTTCATGACGCCGATCTTCGG 

CGAGGGCGTGGTGTTCGACGCCAGCCCGGAACGGCGTAAAGAGATGCTGCACAATGCCGCGC 

TACGCGGCGAGCAGATGAAGGGCCACGCTGCCACCATCGAAGATCAAGTCCGACGGATGATCG 

CCGACTGGGGTGAGGCCGGCGAGATCGATCTGCTGGACTTCTTCGCCGAGCTGACCATCTACA 

CCTCCTCGGCCTGCCTGATCGGCAAGAAGTTCCGCGACCAGCTCGACGGGCGATTCGCCAAGC 

TCTATCACGAGTTGGAGCGCGGCACCGACCCACTAGCCTACGTCGACCCGTATCTGCCGATCG 

AGAGCTTCCGTCGCCGCGACGAAGCCCGCAATGGTCTGGTGGCACTGGTTGCGGACATCATGA 

ACGGCCGGATCGCCAACCCACCCACCGACAAGAGCGACCGTGACATGCTCGACGTGCTCATCG 
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CCGTCAAGGCTGAGACCGGCACTCCCCGGTTCTCGGCCGACGAGATCACCGGCATGTTCATCT 
CGATGATGTTCGCCGGCCATCACACCAGCTCGGGTACGGCTTCGTGGACGCTGATCGAGTTGA 
TGCGCCATCGCGACGCCTACGCGGCCGTGATCGACGAACTCGACGAGCTGTACGGCGACGGC 
CGATCGGTGAGTTTCCATGCGCTGCGCCAGATTCCGCAGCTGGAAAACGTGCTGAAAGAGACG 
CTGCGCCTGCACCCTCCGCTGATCATCCTCATGCGAGTGGCCAAGGGCGAGTTCGAGGTGCAA 
GGCCACCGGATTCATGAGGGCGATCTGGTGGCGGCCTCCCCGGCGATCTCCAACCGGATCCCC 
GAAGACTTCCCCGATCCCCACGACTTCGTGCCAGCACGATACGAGCAGCCGCGCCAGGAAGAT 
CTGCTCAACCGCTGGACGTGGATTCCGTTCGGCGCCGGCCGGCATCGTTGCGTGGGGGCGGC 
GTTCGCCATCATGCAGATCAAAGCGATCTTCTCGGTGTTGTTGCGCGAGTATGAGTTTGAGATG 
GCGCAACCGCCAGAAAGCTATGGTAACGACCATTCGAAGATGGTGGTGCAGTTGGCCCAGCCC 
GCTTGCGTGCGCTACCGCCGGCGAACGGGAGTTTAA 

>Rv0861 c - DNA helicase TB.seq 958524:960149 MW:59773 
>emb|AL123456|MTBH37RV:c960149-958521, Rv0861cSEQ ID NO:34 
15 GTGCAGTCCGATAAGACGGTGCTGTTGGAAGTCGACCATGAACTGGCCGGCGCTGCACGCGCC 
GCCATCGCGCCGTTCGCCGAGCTGGAACGTGCACCCGAACATGTCCACACCTACCGCATCACA 
CCGCTGGCACTGTGGAATGCTCGCGCCGCCGGCCATGATGCCGAGCAAGTCGTCGACGCGCT 
GGTCAGTTACTCCCGCTACGCGGTGCCGCAACCCTTGCTCGTCGACATCGTCGACACCATGGG 
CCGCTACGGACGACTGCAGTTGGTCAAGAACCCGGCCCATGGCCTGACGCTGGTGAGCCTGGA 
20 CCGCGCGGTGCTTGAGGAAGTGCTGCGCAACAAGAAGATCGCGCCGATGCTTGGCGCCCGCAT 
CGATGACGACACCGTCGTCGTCCACCCCAGCGAACGCGGCCGGGTCAAGCAGCTGCTGCTCAA 
GATCGGTTGGCCCGCAGAGGATCTCGCCGGCTACGTCGATGGTGAAGCGCACCCGATCAGCCT 
GCACCAGGAGGGCTGGCAGCTGCGCGATTACCAGCGGCTGGCCGCGGACTCGTTCTGGGCGG 
GCGGCTCGGGGGTGGTGGTGCTGCCATGTGGGGCCGGCAAGACGCTGGTCGGTGCGGCCGC 
25 AATGGCCAAAGCCGGCGCGACGACGTTGATCCTGGTCACCAATATCGTCGCGGCCCGGCAATG 
GAAACGAGAGCTGGTCGCGCGCACCTCGCTCACCGAGAATGAGATCGGCGAATTCTCGGGAGA 
ACGCAAGGAAATCCGACCTGTCACCATCTCGACATACCAGATGATCACCCGCCGCACTAAGGGC 
GAGTACCGCGATCTGGAAGTGTTCGACAGCCGCGACTGGGGGCTCATCATCTATGACGAGGTG 
CACCTGTTGCCGGCACCGGTCTTCCGGATGACCGCTGACCTGCAGTCCAAACGGCGGCTGGGG 
30 CTGACCGCCACGTTGATCCGTGAAGACGGACGCGAGGGCGACGTGTTTTCCCTTATCGGACCA 
AAGCGCTATGACGCGCCGTGGAAGGACATTGAGGCGCAGGGCTGGATCGCGCCAGCTGAGTG 
CGTGGAAGTCCGGGTCACGATGACCGACAGCGAGCGGATGATGTACGCCACCGCCGAACCCG 
AAGAACGCTACCGGATCTGCTCGACGGTGCACACCAAAATTGCTGTGGTCAAGTCGATTCTGGC 
GAAGCACCCGGATGAGCAGACCCTGGTCATCGGAGCGTACTTGGATCAGCTCGACGAGCTGGG 
35 CGCCGAGCTCGGCGCTCCGGTGATTCAGGGGTCGACAAGGACCAGCGAACGCGAGGCACTGT 
TCGACGCCTTCCGCCGCGGCGAGGTCGCTACGCTCGTGGTGTCCAAGGTGGCTAACTTCTCCA 
TCGACTTGCCGGAAGCCGCCGTGGCGGTACAGGTTTCGGGAACATTCGGCTCACGCCAGGAAG 
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AGGCGCAACGGCTCGGCCGGATATTGCGACCCAAGGCCGACGGGGGCGGTGCCATCTTCTAC 

TCGGTGGTGGCCCGCGACAGCCTGGATGCCGAGTACGCCGCACACCGGCAGCGGTTTTTAGCT 

GAGCAGGGCTACGGTTACATCATCCGCGACGCCGACGACCTGCTGGGCCCGGCAATTTAG 

>Rv0904c accD3 TB.seq 1006694:1008178 MW:51741 
>emb|AL123456|MTBH37RV:c1008178-1006691. accD3 SEQ ID NO:35 

GTGAGTCGTATCACGACCGACCAACTGCGGCACGCGGTGCTAGACCGGGGATCTTTCGTCAGC 

TGGGATAGCGAGCCGCTGGCGGTGCCGGTAGCCGACTCCTATGCGCGGGAGCTGGCCGCCGC 

TCGGGCGGCCACCGGCGCGGACGAATCGGTGCAGACCGGTGAGGGACGCGTA7TCGGGCGG 

CGGGTGGCCGTGGTGGCCTGTGAGTTCGACTTCCTGGGCGGCTCGATTGGGGTGGCAGCGGC 

CGAACGGATCACCGCCGCCGTCGAGCGGGCGACCGCCGAGCGGCTGCCGCTACTGGCGTCAC 

CAAGCTCGGGAGGCACCCGCATGCAAGAAGGCACGGTCGCGTTTCTGCAGATGGTGAAGATCG 

CTGCGGCCATCCAGCTGCACAACCAGGCGCGCCTGCCCTACCTGGTCTATTTGCGCCATCCGA 

CCACGGGTGGAGTTTTCGCGTCGTGGGGCTCGCTGGGGCATCTCACCGTCGCCGAGCCGGGC 

GCCCTGATCGGCTTTCTGGGACCACGGGTCTATGAGTTGCTCTATGGCGACCCCTTCCCATCCG 

GCGTCCAAACCGCCGAGAATCTACGGCGGCATGGGATCATCGACGGCGTCGTTGCAGTGGACC 

GGCTACGACCGATGCTGGATCGTGCGTTGACGGTGCTCATCGACGCTCCCGAAGCGCTTCCGG 

CACCGCAGACGCCCGCGCCCGTACCCGATGTGCCCACGTGGGACTCGGTGGTGGCATCGCGC 

CGGCCGGACCGGCCGGGCGTCAGGCAGCTACTGCGACACGGCGCCACCGACCGGGTGTTGTT 

GTCAGGAACCGATCAAGGCGAAGCGGCGACCACGCTGCTGGCGCTGGCCCGCTTTGGCGGCC 

AACCCACGGTGGTCCTCGGCCAGCAAAGGGCAGTAGGCGGCGGGGGAAGCACTGTCGGGCCC 

GCTGCGTTACGCGAAGCCCGACGCGGGATGGCGCTCGCCGCCGAGCTGTGCCTGCCGCTGGT 

GCTGGTCATTGACGCGGCCGGACCCGCGTTGTCGGCCGCAGCCGAACAGGGCGGGCTGGCCG 

GCCAGATCGCGCATTGCCTGGCCGAGCTCGTCACGCTGGATACCCCGACCGTGTCGATCCTGC 

TGGGCCAGGGCAGCGGCGGGCCGGCGCTGGCGATGTTGCCCGCCGACCGGGTGCTGGCCGC 

ACTCCACGGCTGGCTGGCGCCCTTGCCTCCCGAAGGAGCCAGCGCGATCGTGTTCCGAGACAC 

TGCTCATGCCGCCGAACTCGCTGCCGCCCAAGGCATCCGGTCGGCCGACCTACTGAAGTCGGG 

GATTGTCGACACCATCGTGCCGGAGTACCCCGACGCCGCAGACGAGCCGATCGAGTTCGCCCT 

ACGACTGTCGAACGCCATCGCCGCCGAAGTGCACGCGTTACGGAAGATACCGGCCCCGGAACG 

CCTCGCGACTCGGTTGCAAGGCTACCGCCGGATCGGGTTGCCCCGCGACTAA 

>Rv0983 - TB.seq 1099064:1 100455 MW:46454 

>emb|AL123456lMTBH37RV:1099064-1100458. Rv0983 SEQ ID NO:36 

ATGGCCAAGTTGGCCCGAGTAGTGGGCCTAGTACAGGAAGAGCAACCTAGCGACATGACGAAT 
CACCCACGGTATTCGCCACCGCCGCAGCAGCCGGGAACCCCAGGTTATGCTCAGGGGCAGCA 
GCAAACGTACAGCCAGCAGTTCGACTGGCGTTACCCACCGTCCCCGCCCCCGCAGCCAACCCA 
GTACCGTCAACCCTACGAGGCGTTGGGTGGTACCCGGCCGGGTCTGATACCTGGCGTGATTCC 
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GACCATGACGCCCCCTCCTGGGATGGTTCGCCAACGCCCTCGTGCAGGCATGTTGGCCATCGG 

CGCGGTGACGATAGCGGTGGTGTCCGCCGGCATCGGCGGCGCGGCCGCATCCCTGGTCGGGT 

TCAACCGGGCACCCGCCGGCCCCAGCGGCGGCCCAGTGGCTGCCAGCGCGGCGCCAAGCAT 

CCCCGCAGCAAACATGCCGCCGGGGTCGGTCGAACAGGTGGCGGCCAAGGTGGTGCCCAGTG 

TCGTCATGTTGGAAACCGATCTGGGCCGCCAGTCGGAGGAGGGCTCCGGCATCATTCTGTCTG 

CCGAGGGGCTGATCTTGACCAACAACCACGTGATCGCGGCGGCCGCCAAGCCTCCCCTGGGC 

AGTCCGCCGCCGAAAACGACGGTAACCTTCTCTGACGGGCGGACCGCACCCTTCACGGTGGTG 

GGGGCTGACCCCACCAGTGATATCGCCGTCGTCCGTGTTCAGGGCGTCTCCGGGCTCACCCCG 

ATCTCCCTGGGTTCCTCCTCGGACCTGAGGGTCGGTCAGCCGGTGCTGGCGATCGGGTCGCCG 

CTCGGTTTGGAGGGCACCGTGACCACGGGGATCGTCAGCGCTCTCAACCGTCCAGTGTCGACG 

ACCGGCGAGGCCGGCAACCAGAACACCGTGCTGGACGCCATTCAGACCGACGCCGCGATCAA 

CCCCGGTAACTCCGGGGGCGCGCTGGTGAACATGAACGCTCAACTCGTCGGAGTCAACTCGGC 

CATTGCCACGCTGGGCGCGGACTCAGCCGATGCGCAGAGCGGCTCGATCGGTCTCGGTTTTGC 

GATTCCAGTCGACCAGGCCAAGCGCATCGCCGACGAGTTGATCAGCACCGGCAAGGCGTCACA 

TGCCTCCCTGGGTGTGCAGGTGACCAATGACAAAGACACCCTGGGCGCCAAGATCGTCGAAGT 

AGTGGCCGGTGGTGCTGCCGCGAACGCTGGAGTGCCGAAGGGCGTCGTTGTCACCAAGGTCG 

ACGACCGCCCGATCAACAGCGCGGACGCGTTGGTTGCCGCCGTGCGGTCCAAAGCGCCGGGC 

GCCACGGTGGCGCTAACCTTTCAGGATCCCTCGGGCGGTAGCCGCACAGTGCAAGTCACCCTC 

GGCAAGGCGGAGCAGTGA 

>Rv1008 - Similar to E.coli protein YcfH TB.seq 1127087:1127878 MW:29066 
>emb|AL123456|MTBH37RV:1127087-1127881. Rv1008 SEQ ID NO:37 

TTGGTCGACGCCCACACCCATCTCGACGCGTGCGGTGCACGAGACGCCGATACGGTGCGGTC 

GCTCGTCGAGCGAGCCGCCGCGGCCGGCGTGACCGCGGTGGTCACCGTCGCCGACGACCTG 

GAGTCCGCGCGCTGGGTCACCCGCGCGGCCGAATGGGATCGGCGAGTCTATGCCGCGGTGGC 

GTTGCACCCGACCCGCGCCGATGCGCTCACCGACGCTGCCCGTGCCGAGCTCGAGCGATTGG 

TTGCCCACCCCAGGGTGGTGGCCGTCGGTGAGACCGGAATCGACATGTACTGGCCGGGTCGC 

CTGGACGGGTGTGCGGAGCCGCACGTCCAGCGGGAGGCCTTTGCCTGGCATATCGATCTGGC 

CAAGCGGACCGGTAAACCGCTGATGATCCACAATCGTCAGGCCGACCGCGACGTGCTGGACGT 

GCTGCGGGCCGAGGGCGCGCCGGACACCGTGATCTTGCACTGCTTCTCGTCGGACGCGGCGA 

TGGCCCGCACGTGTGTGGACGCCGGGTGGCTGCTCAGCCTGTCCGGGACGGTGAGCTTCCGT 

ACCGCCCGTGAACTACGGGAAGCCGTCCCGCTGATGCCGGTGGAGCAGCTTTTGGTGGAAACC 

GATGCACCGTATTTGACCCCGCATCCCCACCGGGGCTTGGCGAACGAACCGTACTGCCTGCCC 

TATACCGTGCGGGCGCTGGCTGAACTGGTCAATCGGCGCCCCGAAGAGGTGGCGCTCATCACC 

ACAAGCAACGCTCGCCGAGCTTATGGGCTAGGGTGGATGCGCCAATGA 
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>Rv1009 - lipoprot in. similar to various ther MTB proteins TB.seq 1 128089:1 129174 MW:38079 
>emb|AL123456|MTBH37RV:1 128089-1 1291 77. Rv1009 SEQ ID NO:38 

ATGTTGCGCCTGGTAGTCGGTGCGCTGCTGCTGGTGTTGGCGTTCGCCGGTGGCTATGCGGTC 

GCCGCATGCAAAACGGTGACGTTGACCGTCGACGGAACCGCGATGCGGGTGACCACGATGAAA 

TCGCGGGTGATCGACATCGTCGAAGAGAACGGGTTCTCAGTCGACGACCGCGACGACCTGTAT 

CCCGCGGCCGGCGTGCAGGTCCATGACGCCGACACCATCGTGCTGCGGCGTAGCCGTCCGCT 

GCAGATCTCGCTGGATGGTCACGACGCTAAGCAGGTGTGGACGACCGCGTCGACGGTGGACG 

AGGCGCTGGCCCAACTCGCGATGACCGACACGGCGCCGGCCGCGGCTTCTCGCGCXiAGCCGC 

GTCCCGCTGTCCGGGATGGCGCTACCGGTCGTCAGCGCCAAGACGGTGCAGCTCAACGACGG 

CGGGTTGGTGCGCACGGTGCAC7TGCCGGCCCCCAATGTCGCGGGGCTGCTGAGTGCGGCCG 

GCGTGCCGCTGTTGCAAAGCGACCACGTGGTGCCCGCCGCGACGGCCCCGATCGTCGAAGGC 

ATGCAGATCCAGGTGACCCGCAATCGGATCAAGAAGGTCACCGAGCGGCTGCCGCTGCCGCCG 

AACGCGCGTCGTGTCGAGGACCCGGAGATGAACATGAGCCGGGAGGTCGTCGAAGACCCGGG 

GGTTCCGGGGACCCAGGATGTGACGTTCGCGGTAGCTGAGGTCAACGGCGTCGAGACCGGCC 

GTTTGCCCGTCGCCAACGTCGTGGTGACCCCGGCCXJACGAAGCCGTGGTGCGGGTGGGCACC 

AAGCCCGGTACCGAGGTGCCCCCGGTGATCGACGGAAGCATCTGGGACGCGATCGCCGGCTG 

TGAGGCCGGTGGCAACTGGGCGATCAACACCGGCAACGGGTATTACGGTGGTGTGCAGTTTGA 

CCAGGGCACCTGGGAGGCCAACGGCGGGCTGCGGTATGCACCCCGCGCTGACCTCGCCACCC 

GCGAAGAGCAGATCGCCGTTGCCGAGGTGACCCGACTGCGTCAAGGTTGGGGCGCCTGGCCG 

GTATGTGCTGCACGAGCGGGTGCGCGCTGA 

>Rv1010 ksgA 16S rRN A dimethyltransferase TB.seq 1129150:1130100 MW:34647 
>emb|AL123456|MTBH37RV:1 129150-1 130103. ksgA SEQ ID NO:39 

ATGTGCTGCACGAGCGGGTGCGCGCTGACCATCCGGCTGCTCGGGCGCACTGAGATCAGGCG 

GCTGGCCAAAGAGCTCGACTTTCGGCCGCGCAAATCTCTCGGACAGAACTTCGTGCACGACGC 

CAACACGGTGCGACGGGTGGTTGCCGCCTCCGGGGTCAGCCGTTCCGACCTGGTTTTGGAGGT 

CGGGCCGGGCCTGGGATCGCTGACCCTGGCACTGCTCGACCGCGGCGCGACCGTCACCGCGG 

TCGAGATCGATCCACTACTGGCTTCTCGGCTGCAACAGACCGTGGCGGAGCACTCGCACAGCG 

AGGTTCACCGACTAACGGTGGTCAATCGCGACGTCCTGGCCCTGCGCCGGGAGGATCTAGCGG 

CGGCGCCGACCGCGGTGGTTGCCAATCTGCCGTACAACGTAGCGGTACCGGCGTTGTTGCATC 

TGCTTGTCGAGTTCCCGTCGATCCGTGTCGTGACGGTGATGGTGCAGGCCGAGGTCGCCGAAC 

GGCTCGCCGCCGAGCCGGGCAGCAAAGAGTACGGCGTGCCCAGCGTTAAGCTGCGCTTCTTC 

GGGCGGGTTCGCCGCTGCGGCATGGTGTCGCCGACCGTTTTCTGGCCCATTCCGCGTGTCTAT 

TCCGGGCTGGTACGCATCGATCGATATGAGACCTCGCCCTGGCCCACCGACGACGCTTTTCGA 

CGGCGGGTATTCGAACTCGTGGACATCGCATTCGCGCAGCGGCGCAAGACTTCTCGCAACGCG 

TTTGTGCAGTGGGCGGGCTCGGGAAGCGAGTCGGCGAATCGATTGTTGGCGGCCAGCATCGAC 

CCCGCCCGTCGCGGTGAGACGCTGTCCATCGACGACTTCGTGCGGCTGCTGCGACGGTCCGG 
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CGGCTCCGACGAGGCCACCAGCACCGGCCGGGACGCCAGGGCGCCGGACATTTCGGGGCAC 
GCGTCGGCGAGCTGA 

>Rv1011- Homology to E.coli protein YcbH TB.seq 1130189:1131106 MW:31350 

5 >emb|AL123456|MTBH37RV:1130189-1131109. RvlOH SEQ ID NO:40 

GTGCCCACCGGGTCGGTCACCGTTCGGGTGCCCGGAAAGGTCAACCTCTATCTGGCGGTCGGC 
GATCGCCGCGAGGACGGCTATCACGAGCTGACCACGGTATTTCATGCCGTCTCGCTGGTCGAC 
GAGGTAACCGTTCGTAACGCTGATGTGCTCTCGCTCGAGTTGGTCGGCGAGGGGGCCGACCAG 
CTGCCGACCGACGAACGCAATCTCGCCTGGCAGGCGGCCGAGCTGATGGCCGAACACGTGGG 

1 0 CCGGGCGCCGGACGTCTCGATCATGATCGACAAATCCATTCCGGTCGCCGGCGGCATGGCCG 
GTGGCAGCGCGGACGCTGCGGCGGTCCTGGTTGCGATGAACTCGTTGTGGGAACTCAATGTGC 
CCCGCCGCGACCTGCGCATGCTCGCCGCGCGGCTAGGCAGCGATGTGCCGTTTGCXSCTGCAT 
GGTGGTACCGCGCTGGGGACGGGTCGCGGCGAGGAGTTGGCCACCGTGTTATCCCGCAACAC 
CTTCCACTGGGTCCTGGCGTTCGCCGACAGCGGGTTGCTCACCTCCGCGGTGTACAACGAGCT 

1 5 CG ACCGGCTCAGGGAGGTGGGGGATCCGCCCCGGCTTGGTGAGCCCGGGCCGGTTCTGGCTG 
CCTTAGCTGCGGGTGATCCGGATCAGCTGGCGCCGTTGCTGGGTAATGAAATGCAAGCGGCCG 
CGGTGAGCCTGGACCCGGCGCTGGCTCGTGCGTTACGCGCCGGTGTGGAGGCCGGCGCGCTC 
GCAGGCATCGTGTCCGGTTCGGGTCCCACGTGTGCCTTCCTGTGCACCTCGGCGAGCTCGGCG 
ATCGATGTCGGCGCGCAGCTGTCGGGGGCGGGAGTTTGTCGCACCGTTCGAGTCGCCACCGG 

20 GCCGGTACCCGGCGCCCGCGTGGTGTCTGCGCCGACCGAAGTGTGA 

>Rv1 1 06c - cholesterol dehydrogenase TB.seq 1 232845: 1 233954 MW:40743 
>emb|AL123456|MTBH37RV:c1 233954-1 232842. Rv1106c SEQ ID NO:41 - 
ATGCTTCGCCGCATGGGTGATGCATCGCTGACAACCGAGCTCGGCCGCGTTCTGGTCACCGGC 

25 GGCGCGGGCTTCGTGGGCGCCAACCTGGTGACCACCTTGCTGGACCGCGGGCACTGGGTGCG 
TTCCTTCGACCGCGCGCCGTCGCTGTTGCCTGCGCATCCGCAACTGGAGGTGCTGCAAGGGGA 
CATCACCGACGCGGACGTCTGCGCCGCGGCCGTGGACGGCATCGACACGATCTTCCACACCG 
CAGCGATCATCGAGCTGATGGGCGGCGCGTCGGTCACCGACGAGTACCGCCAACGTAGCTTTG 
CGGTCAACGTCGGCGGCACCGAGAACCTGCTGCACGCCGGCCAGCGGGCCGGGGTGCAGCG 

30 GTTCGTCTACACGTCATCCAACAGTGTGGTGATGGGCGGCCAGAACATCGCCGGCGGTGACGA 
GACGCTGCCCTATACCGACCGGTTCAACGACCTCTACACCGAGACCAAGGTGGTTGCCGAGCG 
ATTCGTGTTGGCCCAGAACGGTGTCGACGGCATGCTGACGTGCGCGATCCGGCCCAGCGGCAT 
CTGGGGAAACGGCGATCAGACGATGTTCCGCAAGCTGTTCGAAAGTGTGCTCAAGGGCCACGT 
CAAGGTGCTGGTCGGGCGCAAGTCGGCCCGGCTGGATAACTCTTACGTGCACAACCTGATTCA 

35 CGGTTTCATCTTGGCXJGCTGCCCATCTGGTGCCGGACGGCACAGCGCCCGGGCAGGCTTACTT 
CATCAACGACGCAGAGCCGATCAATATGTTCGAGTTCGCTCGGCCGGTGCTCGAGGCGTGCGG 
GCAGCGCTGGCCGAAGATGCGGATTTCCGGCCCCGCGGTCCGCTGGGTAATGACGGGGTGGC 
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AGCGGCTGCACTTCCGGTTCGGATTCCCCGCGCCGCTGCTCGAGCCGCTGGCCGTCGAACGAC 
TGTACCTGGACAACTACTTTTCGATCGCTAAGGCACGCCGCGACCTGGGCTATGAGCCGCTGTT 
CACCACCCAGCAGGCGCTGACCGAATGCCTGCCGTACTACGTGAGTCTGTTTGAGCAGATGAA 
GAACGAGGCCCGGGCGGAAAAAACGGCCGCCACAGTCAAGCCGTAG 

>Rv1110 lytB2 TB.seq 1236183:1237187 MW:36298 
>emb|AL123456|MTBH37RV:1236183-1237190. lytB' SEQ ID NO:42 

ATGGTTCCGACGGTCGACATGGGGATTCCCGGGGCTTCGGTATCGTCGCGATCGGTGGCCGAC 

CGTCCCAACCGTAAGCGGGTGCTGCTGGCCGAGCCGCGTGGCTACTGCGCTGGCGTGGATCG 

GGCCGTCGAAACGGTCGAACGCGCGCTTCAAAAACACGGCCCGCCTGTCTACGTGCGTCACGA 

GATCGTGCATAACCGCCACGTGGTTGACACCCTGGCTAAGGCCGGTGCGGTTTTCGTCGAAGA 

GACCGAGCAGGTTCCXGAGGGAGCGATTGTGGTGTTCTCCGCGCACGGGGTCGCGCCTACGG 

TGCACGTCAGCGCCAGCGAGCGCAACCTGCAGGTCATTGACGCCACCTGCCCGCTGGTCACCA 

AGGTGCACAACGAGGCCAGGCGGTTCGCCCGGGACGACTACGACATCTTGCTGATCGGTCATG 

AGGGCCACGAGGAAGTCGTCGGTACTGCTGGGGAAGCTCCCGATCATGTGCAGCTGGTCGACG 

GGGTGGACGCCGTCGACCAGGTGACCGTCCGTGACGAGGACAAAGTGGTTTGGCTGTCGCAG 

ACCACCCTGTCCGTCGATGAGACCATGGAGATTGTCGGGCGGTTGCGTCGGCGTTTCCCCAAG 

CTGCAGGATCCGCCCAGCGACGACATCTGCTATGCGACCCAGAATCGGCAGGTCGCGGTCAAG 

GCGATGGCGCCCGAGTGCGAGCTGGTCATCGTGGTCGGCTCGCGCAATTCGTCGAATTCGGTT 

CGGCTGGTCGAGGTGGCGCTGGGTGCCGGGGCGCGGGCCGCCCACCTGGTGGACTGGGCCG 

ACGATATCGACTCGGCCTGGCTGGACGGCGTTACCACGGTCGGCGTTACGTCGGGGGCATCGG 

TCCCCGAGGTGCTGGTGCGCGGTGTGCTGGAGCGGCTGGCCGAATGCGGCTACGACATCGTG 

CAACCGGTGACAACGGCCAACGAGACGTTGGTGTTCGCATTGCCCCGGGAGCTCCGCTCACCT 

CGCTGA 

>Rv1216c - TB.seq 1359473:1380144 MW:24863 

>emb|AL123456|MTBH37RV:c1360144-1359470. Rv1216c SEQ ID NO:43 

ATGCACATTGGGCTGAAGATATTCATATGGGGCGTGTTAGGACTCGTCGTTTTCGGCGCGCTCC 

TATTCGGGCCAGCCGGCACGTTCGACTATTGGCAGGCGTGGGTGTTCCTCGCCGCATTTGTGA 

GCACCACGATTGGCCCCACAATCTATCTGGCTCGGAACGATCCCGCGGCCCTTCAACGTCGCAT 

GCGCAGCGGTCCGCTCGCGGAGGGCCGAACGATTCAGAAGTTCATCGTCATCGGCGCTTTTCT 

GGGGTTCTTCGCGATGATGGTGCTGAGCGCGTGCGACCATCGTTATGGTTGGTCGTCAGTGCC 

AGCCGCGGTGTGCGTGATCGGCGACGTCCTAGTGATGACGGGCCTTGGCATCGCCATGCTGGT 

GGTCATCCAGAACAGGTATGCCGCCTCGACGGTCAGGGTGGAGGCGGGCCAGATATTGGCCTC 

CGACGGTCTCTACAAAATTGTCCGACACCCGATGTACGCCGGGAACGTGGTCATGATGACAGG 

CATACCGCTGGCACTGGGCTCTTACTGGGCGATGTTCATCCTCGTCCCCGGCACACTGGTGTTG 
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GTGTTCCGCATCCTCGACGAGGAAAAACTACTGACGCAAGAACTCAGCGGGTACCGCGAATACC 
GGCAACTGGTGCGCTACCGGTTGGTGCCCTACGTGTGGTAG 

>Rv1223 htrATB.seq 1365810:1367456 MW:56547 
5 >emb|AL123456|MTBH37RV:1365810-1367459. htrA SEQ ID NO:44 

gtgagccacttgtcgcagcgcatggcggggttgctgcgagttcatggcgagtggtcgcgatcc 
gtggatactagggtggacacggacaacgcgatgcctgcacgttttagcgcccagattcagaat 
gaggatgaggtgacctccgaccaaggcaacaacggcggcccgaacggcggaggccgcctggc 
gccgcgcccggtttttcggccaccggtcgacccggcgtcgcgtcaagcgttcgggcgtccgt 

10 ccggggtccaagggtcctttgtggccgagcgtgtgcgcccgcagaagtaccaggaccagtct 
gacttcacaccgaacgatcagcttgctgacccggtgcttcaggaggcgttcggtcgtccgttc 
gcgggcgccgaatcgctgcagcgccatcccatcgatgccggagcgctggcagctgagaaaga 
cggtgccggccccgacgagcccgacgatccgtggcgcgaccccgcggccgcggccgcgctg 
gggacgccagcgctagccgcgccggcaccgcacggtgcgctggccggcagcggcaagctgg 

1 5 gtgtgcgcgacgtgctgtttggcggcaaggtgtcctacttggcgctgggcatcttggtcgcta 
tcgcactggtgatcggcggcatcggcggtgtcatcggccgcaagaccgcggaagtagtcgat 
gcgttcaccacgtcgaaggtgaccctgtcgaccactggcaatgcccaggaaccggcx:ggccg 
gttcaccaaggtggcggccgccgtggccgattcggtggtgaccattgagtcggtcagcgacca 
ggagggcatgcaaggttccggcgtcatcgtcgatggccgcggctacatcgtcaccaacaatca 

20 cgtgatctctgaggcggccaacaatcccagccagttcaagacgaccgtggtgttcaacgacgg 
caaggaggtgcccgccaatctggtgggtcgtgaccccaagaccgacttggccgtcctcaaggt 
cgacaacgtcgacaatctgaccgtggcccggctcggtgattccagcaaggtacgggtcggtga 
cgaagtcctcgcggtcggcgcgcccctggggctgcgcagtacggtgacccagggcattgtca 
gcgcgctacaccgccccgttccgttgtcgggcgagggctctgacaccgacaccgtcattgacg 

25 caattcagaccgacgcctcgatcaaccacggtaactccggcggtccgctaatcgacatggatgc 
cgaggtgattggcatcaacaccgccggtaagtcactgtcggatagcgccagcgggctgggctt 
tgcgatcccggtcaacgagatgaaattggtggcaaattctctgatcaaagacggaaagatcgtg 
catccgacgttgggcatcagcacccggtcagtaagcaacgcgatcgcgtcgggcgcgcaggt 
ggccaatgtaaaggcgggaagtcccgcgcagaagggcgggatcttggagaacgatgtgatcgt 

30 caaggtcggtaaccgcgcggtcgccgactccgacgagttcgtcgtcgccgtgcgccagttgg 
ctatcggccaggacgctccgatagaggtggtccgcgagggtcggcatgtgacgctgacggtg 
aaaccggaccccgatagcacctag 

>Rv1224 -TB.seq 1367461:1367853 MW: 14083 
35 >emb|AL123456|MTBH37RV:1367461-1367856, Rv1224 SEQ ID NO:45 

GTGTTCGCCAACATCGGTTGGTGGGAAATGCTCGTCCTCGTCATGGTCGGGCTGGTGGTGCTT 
GGCCCGGAGCGGCTCCCGGGTGCCATCCGCTGGGCGGCAAGCGCTCTGCGGCAGGCGCGCG 
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ACTATCTCAGCGGTGTGACCAGCCAGCTACGTGAGGACATTGGACCCGAATTCGATGATCTGCG 
GGGACATCTCGGTGAGCTGCAGAAGCTACGGGGAATGACTCCGCGGGCTGCGTTGACCAAGCA 
CCTACTGGATGGCGATGATTCCCTGTTCACCGGAGACTTCGACCGACCGACGCCGAAGAAACC 
GGATGCGGCGGGCTCGGCGGGGCCGGACGCTACTGAGCAGATCGGTGCGGGGCCCATCCCG 
5 TTTGACAGCGATGCCACCTAG 

>Rv1229c mrp similar to MRP/NBP35 ATP-binding proteins TB.seq 1371778:1372947 MW:41064 
>emb|AL123456|MTBH37RV:c1 372947-1 371 775. mrp SEQ ID NO:46 

ATGCCAAGCCGCCTACACTCGGCGGTGATGTCCGGAACTCGTGATGGCGACCTGAACGCGGCG 

1 0 ATACGCACCGCGCTGGGCAAGGTAATCG ACCCCGAATTGCGGCGCCCCATCACCG AACTGGGG 
ATGGTCAAAAGCATCGACACCGGCCCGGATGGGAGCGTGCACGTCGAGATCTACCTGACCATC 
GCCGGCTGCCCGAAGAAGTCCGAAATCACCGAGCGTGTCACCCGGGCGGTCGCCGACGTGCC 
AGGCACTTCGGCGGTGCGGGTCAGCTTGGACGTGATGAGCGACGAGCAGCGCACCGAGCTGC 
GTAAGCAGTTGCGTGGCGATACCCGCGAACCCGTCATCCCGTTCGCGCAACCCGATTCCTTGAC 

1 5 CCGGGTGTATGCCGTGGCTTCCGGTAAGGGCGGAGTCGG AAAGTCCACCGTCACGGTCAACCT 
GGCCGCCGCGATGGCCGTCCGCGGCCTGTCGATCGGGGTGCTGGACGCTGATATCCACGGCX: 
ACTCTATCCCCCGGATGATGGGCACCACCGACCGGCCTACCCAGGTTGAGTCGATGATCCTGC 
CGCCGATCGCCCACCAGGTGAAGGTCATCTCGATAGCCCAGTTCACCCAGGGCAACACCCCGG 
TGGTGTGGCGCGGGCCGATGCTGCACCGGGCGTTGCAGCAGTTTCTGGCCGACGTGTACTGG 

20 GGGGATCTGGACGTGCTGCTGCTGGACTTGCCGCCCGGAACCGGCGACGTCGCCATCTCGGT 
GGCTCAACTGATCCCCAACGCCGAACTCCTGGTGGTCACCACCCCGCAGCTGGCCGCCGCGGA 
GGTGGCCGAACGGGCCGGCAGCATCGCGCTGCAAACCCGCCAACGCATCGTCGGCGTCGTGG 
AGAACATGTCGGGGCTCACGCTGCCGGACGGCACCACGATGCAGGTGTTCGGCGAGGGCGGT 
GGCCGGCTGGTCGCCGAGCGGTTGTCGCGTGCGGTCGGCGCCGACGTGCCGCTGCTGGGTCA 

25 GATCCCGCTGGACCCCGCACTGGTGGCCGCCGGCGATTCGGGCGTACCGCTCGTGTTGAGCT 
CGCCGGACTCGGCGATCGGCAAGGAACTGCATAGCATCGCCGACGGCTTGTCGACTCGACGAC 
GCGGATTGGCGGGCATGTCGCTGGGGTTGGACCCGACACGACGCTAG 

>Rv1239c corA magnesium and cobalt transport protein TB.seq 1381943:1383040 MW:41470 

30 >emb|AL1 23456|MTBH37RV:c1 383040-1 381 940. corA SEQ I D NO:47 

GTGTTCCCAGGGTTTGACGCATTGCCCGAAGTGCTGCGACCGGTCGCGCGACCCCAGCCGCXJG 
AACGCACACX:CCGTTGCCCAGCCACCGGCCCAAGCCTTGGTCGACTGCGGTGTCTACGTCTGC 
GGCCAGCGACTGCCCGGCAAGTACACCTACGCCGCCGCGCTGCGCGAGGTGCGCGAGATCGA 
ACTGACCGGGCAGGAGGCGTTCGTCTGGATCGGGCTGCACGAGCCCGATGAAAACCAGATGCA 

35 GGACGTAGCAGACGTTTTCGGGTTGCACCCGTTAGCCGTTGAGGACGCCGTGCACGCGCACCA 
GCGACCCAAGTTGGAGCGCTACGACGAGACGCTGTTCCTCGTCCTCAAGACCGTCAACTACGT 
CCCGCACGAATCGGTGGTACTGGCCCGCGAGATCGTCAAAACCGGCGAGATCATGATCTTCGT 
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CGGCAAGGATTTCGTGGTCACCGTCCGCCACGGCGAACACGGCGGGTTATCCGAGGTGCGTAA 

GCGGATGGATGCCGACCCCGAACATTTGCGGTTGGGACGGTATGCGGTGATGCACGCGATCGC 

CGACTACGTGGTCGACCACTACCTCGAGGTGACCAATCTCATGGAGACCGATATCGACAGCATC 

GAGGAAGTAGCGTTCGCGCCGGGCCGCAAGCTCGACATCGAACCGATCTATCTGCTCAAGCGG 

GAAGTGGTCGAGTTGCGCCGGTGCGTGAATCCGCTATCGACCGCATTCCAGCGCATGCAGACC 

GAGAGCAAAGACCTCATTTCGAAAGAAGTGCGGCGCTACCTGCGCGACGTCGCCGACCACCAG 

ACCGAGGCCGCCGACCAGATCGCCAGCTACGACGACATGCTCAACTCGCTGGTGCAGGCCGC 

GCTCGCCCGGGTCGGCATGCAGCAAAACATGGACATGCGCAAGATATCGGCGTGGGCAGGTAT 

CATCGCGGTCCCCACCATGATCGCGGGCATCTATGGCATGAACTTTCACTTCATGCCCGAGCTG 

GAGTCCAGGTGGGGTTACCCGACAGTGATCGGCGGGATGGTCCTTATCTGTCTGTTCCTCTACG 

ACGTCTTCCGCAACAGAAACTGGCTCTAG 

>Rv1279 -TB.seq 1430060:1431643 MW:57332 

>emb|AL123456|MTBH37RV:1430060-1431646, Rv1279 SEQ ID NO:48 

ATGGACAGTCAGAGCGACTACGTCGTGGTCGGTACCGGCTCAGCCGGGGCGGTTGTGGCCAG 

CCGGCTTAGCACCGATCCGGCCACGACGGTGGTGGCCCTGGAGGCGGGGCCGCGTGACAAGA 

ACAGATTCATCGGCGTCCCAGCGGCGTTTTCCAAGCTGTTCCGCAGCGAGATCGACTGGGATTA 

CCTAACCGAACCGCAGCCGGAGGTCGACGGCCGCGAAATCTATTGGCCTCGTGGCAAGGTGCT 

CGGTGGCTCGTCGTCCATGAACGGAATGATGTGGGTGCGTGGATTCGCATCAGACTACGATGA 

GTGGGCCGCGCGAGCCGGTCCGCGGTGGTCGTACGCCGACGTGCTCGGCTACTTTCGCCGCA 

TCGAGAAGGTCACCGCTGCCTGGCACTTTGTCAGCGGTGACGACAGCGGAGTAACCGGTCCGT 

TGCATATTTCCCGGCAACGCAGCCCAAGATCGGTGACCGCAGCGTGGCTGGCAGCCGCACGTG 

AGTGCGGATTTGCCGCTGCGCGGCCGAATTCCCCTCGACCGGAAGGCTTTTGCGAGACCGTCG 

TCACCCAGCGCCGCGGTGCTCGATTCAGTACTGCCGACGCCTATCTGAAGCCCGCGATGCGCC 

GTAAAAACCTCCGTGTGCTTACCGGCGCCACTGCTACCCGGGTGGTCATCGACGGCGACCGGG 

CCGTCGGCGTGGAATACCAAAGCGACGGTCAAACCCGCATCGTCTACGCCCGCCGCGAGGTG 

GTGCTCTGCGCTGGTGCCGTCAAGAGCCCTCAGCTGCTGATGCTCTCCGGCATCGGCGACCGC 

GACCACCTCGCCGAACACGACATCGACACCGTTTACCACGCGCCCGAGGTCGGGTGCAACCTG 

CTCGATCATCTCGTCACGGTGCTGGGTTTCGACGTCGAAAAGGACAGCTTGTTTGCCGCCGAGA 

AGCCCGGCCAGTTGATCAGCTACTTACTGCGACGCCGCGGCATGCTCACCTCCAACGTCGGCG 

AGGCGTACGGATTTGTCCGCAGCCGACCCGAACTGAAGCTGCCCGATTTGGAGTTGATTTTTGC 

CCCGGCGCCGTTTTACGACGAAGCGCTGGTTCCACCGGCTGGTCACGGTGTGGTATTCGGCCC 

GATTCTGGTCGCGCCGGAAAGCCGTGGCCAGATCACGCTGCGGTCCGCCGATCCGCATGCCAA 

GCCTGTCATCGAACCGCGTTACCTGTCCGATCTCGGTGGCGTAGACCGGGCCGCCATGATGGC 

GGGCCTGCGGATATGCGCGCGGATCGCGCAGGCCCGCCCGCTCAGAGATCTCCTTGGGTCCA 

TCGCGCGACCGCGCAACAGCACCGAGCTGGACGAGGCCACTCTCGAGTTGGCGCTGGCCACT 

TGTTCGCACACCCTGTACCACCCGATGGGCACCTGCCGCATGGGCAGCGACGAGGCCAGCGT 
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GGTGGATCCGCAGCTGCGGGTCCGCGGTGTCGACGGACTCCGCGTCGCCGACGCGTCGGTGA 

TGCCCAGCACGGTTCGTGGGCATACGCATGCGCCGTCGGTGCTGATCGGGGAGAAGGCCGCC 

GACTTAATCCGCAGCTGA 

5 >Rv1294 thrA homoserine dehydrogenase TB.seq 1449373:1450695 ^4W:45522 
>emb|AL123456|MTBH37RV:1449373-1450698. thrA SEQ ID NO:49 

GTGCCCGGTGACGAAAAGCCGGTCGGCGTAGCGGTACTCGGTTTGGGCAACGTCGGCAGCGA 
GGTTGTCCGCATCATCGAGAACAGCGCCGAGGATCTCGCGGCTCGTGTCGGTGCCCCATTGGT 
CCTGCGGGGCATCGGCGTGCGCCGCGTGACGACCGATCGCGGCGTGCCGATCGAATTGTTGA 

1 0 CCG ACGACATTGAAGAGCTCGTGGCCCGCGAGGATGTCGATATCGTGGTGGAAGTGATGGGGC 
CGGTGGAACCGTCGCGCAAGGCGATCCTGGGCGCCCTTGAGCGCGGCAAGTCCGTCGTTACG 
GCGAACAAGGCTTTACTCGCCACCTCCACCGGCGAATTGGCACAGGCCGCCGAAAGCGCCCAT 
GTTGATCTGTATTTCGAGGCGGGCGTGGCGGGCGCCATTCCGGTCATCCGTCCGCTCACCCAG 
TCGCTGGCCGGCGACACGGTGCTGCGAGTGGCCGGGATCGTCAACGGCACCACCAACTACATC 

1 5 CTCTCGGCG ATGG ACAGCACCGGCGCTGACTATGCCAGCGCCCTGGCCGACGCAAGTGCGCT 
GGGCTATGCGGAGGCTGATCCCACCGCAGACGTCGAAGGCTACGACGCCGCGGGCAAGGCAG 
CGATGCTGGCATCCATTGCCTTCCACACCCGGGTGACCGCAGACGACGTGTATCGCGAAGGCA 
TCACCAAGGTCACTCCGGCCGACTTCGGATCCGCGCACGCGCTGGGTTGCACCATCAAACTGC 
TGTCGATCTGTGAGCGCATAACCACCGACGAAGGTTCGCAGCGGGTATCGGCCCGCGTCTATC 

20 CGGCCCTGGTACCTCTGTCGCATCCGCTTGCCGCGGTCAACGGCGCGTTCAATGCCGTGGTGG 
TCGAGGCCGAGGCCGCGGGCCGGCTGATGTTCTACGGCCAGGGCGCGGGCGGCGCGCCGAC 
CGCCTCTGCGGTGACCGGTGACCTAGTGATGGCCGCCCGCAACCGGGTACTCGGCAGCCGCG 
GCCCCCGTGAGTCTAAATACGCTCAACTTCCGGTGGCACCAATGGGTTTCATTGAAACGCGCTA 
TTACGTCAGCATGAACGTCGCCGACAAGCCGGGCGTCTTGTCCGCGGTGGCGGCGGAATTCGC 

25 CAAACGCGAGGTGAGCATCGCCGAGGTGCGCCAGGAGGGCGTTGTGGACGAAGGTGGTCGAC 
GGGTGGGAGCCCGAATCGTGGTGGTCACGCACCTCGCCACTGACGCCGCACTCTCGGAAACC 
GTTGATGCACTGGACGACTTGGATGTCGTGCAGGGTGTGTCCAGCGTGATACGACTGGAAGGA 
ACCGGCTTATGA 

30 >Rv1323 fadA4 acetyl-CoA C-acetyltransferase (aka thiL) TB.seq 1485860:1487026 MW:40049 
>emb|AL123456lMTBH37RV:1485860-1487029. fadA4 SEQ ID NO:50 

GTGATTGTTGCTGGCGCGCGTACACCCATCGGCAAGTTGATGGGCTCCCTGAAGGATTTCAGCG 
CCAGCGAGCTGGGTGCCATCGCCATTAAGGGCGCCCTGGAGAAGGCCAACGTGCCGGCGTCC 
TTGGTCGAGTACGTGATCATGGGCCAGGTGTTGACCGCGGGTGCCGGGCAAATGCCCGCACG 
35 GCAGGCGGCAGTGGCGGCCGGCATCGGTTGGGATGTCCCTGCGCTGACGATCAACAAGATGT 
GCCTGTCCGGCATCGACGCAATCGCGCTGGCTGATCAACTCATTCGGGCCAGAGAGTTCGACG 
TGGTGGTGGCCGGCGGTCAGGAGTCGATGACGAAGGCGCCCCACCTGTTGATGAATAGCCGGT 
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CGGGTTACAAGTACGGCGACGTTACGGTTTTGGACCACATGGCCTACGACGGTCTGCACGACG 

TGTTCACCGATCAGCCGATGGGCGCGCTCACCGAGCAACGCAACGACGTCGACATGTTCACCC 

GCTCCGAACAGGACGAGTACGCGGCTGCGTCCCACCAAAAGGCGGCCGCGGCATGGAAGGAC 

GGCGTATTCGCCGACGAGGTGATCCCGGTGAACATCCCGCAGCGCACGGGCGATCCACTGCA 

GTTCACCGAGGACGAGGGGATCCGCGCCAACACCACCGCCGCCGCGCTGGCCGGTCTGAAGC 

CGGCGTTCCGTGGCGACGGCACCATCACCGCCGGGTCGGCGTCACAGATCTGCGACGGTGCG 

GCCGCGGTGGTGGTCATGAACCAGGAAAAGGCCCAGGAACTGGGGCTGACCTGGCTAGCCGA 

GATCGGCGCCCACGGTGTGGTGGCCGGGCCGGATTCCACACTGCAATCGCAGCCGGCCAACG 

CGATCAACAAGGCGCTGGATCGCGAGGGCATCTCGGTGGACCAGCTCGACGTGGTGGAGATCA 

ACGAGGCGTTCGCTGCGGTGGCATTGGCCTCGATACGCGAACTCGGGCTGAACCCCCAGATCG 

TCAACGTCAACGGTGGTGCGATTGCCGTCGGGCATCCCCTCGGCATGTCAGGGACGCGAATCA 

CGCTACATGCGGCGCTGCAGTTGGCACGCCGGGGATCGGGCGTCGGGGTTGCCGCATTGTGC 

GGGGCTGGCGGGCAGGGCGACGCACTGATATTGCGGGCCGGATAG 

>Rv1 389 gmk putative guanylate kinase TB.seq 1 564399:1 565022 MW:22064 
>emb|AL123456|MTBH37RV:1564399-1565025, gmk SEQ ID NO:51 

GTGAGCGTCGGCGAGGGACCGGACACCAAGCCCACCGCGCGTGGCCAACCGGCGGCAGTGG 

GACGTGTGGTGGTGCTGTCCGGTCCTTCCGCGGTCGGCAAATCCACGGTGGTTCGGTGTCTGC 

GCGAGGGGATCCCGAATCTGCATTTCAGTGTCTCGGCCACGACGCGGGCGCCACGCCCGGGC 

GAGGTCGACGGTGTCGACTACCACTTCATCGACCCCACCCGCTTTCAGCAGCTCATCGACCAG 

GGTGAGTTGCTGGAATGGGCAGAAATCCACGGCGGCCTGCACCGGTCGGGCACTTTGGCCCA 

GCCGGTGCGGGCGGCCGCGGCGACTGGTGTGCCGGTGCTTATCGAGGTTGACCTGGCCGGGG 

CCAGGGCGATCAAGAAGACGATGCCCGAGGCTGTCACCGTGTTTCTGGCGCCACCTAGCTGGC 

AGGATCTTCAGGCCAGACTGATTGGCCGCGGCACCGAAACAGCTGACGTTATCCAACGCCGCC 

TGGACACCGCGCGGATCGAATTGGCAGCGCAGGGCGACTTTGACAAGGTCGTGGTGAACAGGC 

GATTAGAGTCTGCGTGTGCGGAATTGGTATCCTTGCTGGTGGGAACGGCACCGGGCTCCCCGT 

GA 

>Rv1407 fmu similar to Fmu protein TB.seq 1583099:1584469 MW:48494 
>emb|AL123456|MTBH37RV: 1583099-1 584472, fmu SEQ ID NO:52 

ATGACCCCTAGATCGCGTGGGCCGCGCCGCCGGCCGCTGGACCCGGCGCGTCGTGCGGCCTT 

CGAGACGCTGCGGGCGGTTAGTGCGCGCGACGCCTACGCGAACCTGGTGTTGCCCGCGCTGC 

TGGCCCAACGCGGTATCGGCGGTCGCGACGCCGCGTTCGCCACCGAGCTGACATACGGCACC 

TGCCGAGCCCGCGGCGTGCTCGACGCGGTCATCGGTGCGGCCGCCGAGCGTTCGCCGCAGGC 

GATCGATCCGGTGCTGCTAGACCTGTTGCGGCTCGGCACGTACCAATTGCTGCGCACGCGGGT 

CGACGCACACGCCGCAGTGTCGACCACCGTCGAGCAGGCCGGAATCGAATTCGATTCGGCGC 

GAGCAGGTTTCGTCAACGGTGTACTACGAACGATCGCCGGCCGAGACGAGCGGTCCTGGGTTG 
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GCGAACTCGCTCCTGATGCGCAGAACGATCCGATCGGGCATGCCGCGTTCGTGCATGCGCATC 

CCCGATGGATCGCCCAGGCCTTTGCTGACGCGTTGGGCGCGGCGGTCGGGGAGCTCGAGGCA 

GTTTTGGCCAGCGACGACGAACGGCCAGCGGTGCACCTGGCGGCACGCCCCGGGGTGCTGAC 

CGCCGGCGAACTGGCCCGCGCGGTGCGCGGAACCGTCGGTCGGTATTCGCCGTTTGCGGTGT 

ATCTGCCGCGCGGTGACCCGGGGCGACTGGCGCCGGTGCGCGACGGCCAAGCGCTGGTCCA 

GGACGAGGGCAGCCAGTTAGTCGCCCGAGCATTGACCCTGGCGCCAGTCGACGGCGATACCG 

GACGGTGGCTGGACCTGTGTGCCGGACCGGGCGGCAAGACCGCGCTGTTGGCCGGGCTGGGT 

TTGCAGTGCGCAGCCCGGGTGACCGCGGTGGAACCCTCGCCACACCGCGCGGACCTGGTAGC 

ACAGAACACCCGCGGGCTGCCGGTTGAGCTCTTGCGTGTCGACGGGCGGCACACCGACCTCG 

ACCCGGGTTTCGACCGGGTGCTGGTGGATGCGCCCTGCACCGGGCTGGGCGCGTTACGCCGT 

CGGCCGGAGGCCCGTTGGCGTCGTCAGCCGGCGGACGTAGCGGCACTGGCCAAGCTACAACG 

CGAGTTGTTGAGCGCCGCCATCGCGCTGACTCGGCCCGGCGGTGTCGTGCTCTATGCCACATG 

CTCGCCGCACCTGGCCGAGACTGTGGGTGCTGTCGCCGACGCGCTACGCCGACATCCGGTTCA 

CGCGCTCGATACCCGCCCACTGTTCGAGCCGGTGATCGCGGGGCTGGGGGAGGGGCCCCACG 

TTCAGCTGTGGCCGCACCGGCACGGTACCGACGCCATGTTCGCCGCGGCGTTGCGCCGCCTG 

ACGTGA 

>Rv1 409 ribG riboflavin biosynthesis TB.seq 1 5851 92: 1 586208 MW:35367 
>emb|AL123456|MTBH37RV:1 5851 92-1 586211. ribG SEQ ID NO:53 

ATGAACGTGGAGCAGGTCAAGAGCATCGACGAGGCTATGGGTCTCGCCATCGAGCACTCCTAC 

CAGGTCAAAGGCACGACTTATCCAAAACCCCCAGTGGGGGCCGTCATTGTGGATCCCAACGGT 

CGGATCGTCGGCGCCGGCGGCACCGAGCCGGCCGGTGGCGATCATGCCGAGGTGGTGGCGC 

TGCGCCGGGCCGGCGGATTGGCTGCCGGCGCCATCGTGGTGGTCACCATGGAACCCTGTAAC 

CACTACGGCAAGACTCCGCCATGCGTGAACGCTCTGATCGAAGCCAGGGTGGGGACGGTGGTC 

TACGCCGTCGCCGACCCGAACGGGATCGCTGGGGGTGGCGCGGGCCGGCTGTCAGCAGCGG 

GCCTACAGGTGCGGTCCGGGGTGTTGGCTGAACAGGTGGCGGCCGGACCGCTGCGGGAGTGG 

CTCCACAAGCAACGCACCGGTCTGCCGCATGTCACCTGGAAGTACGCCACCAGCATCGACGGC 

CGCAGCGCCGCCGCCGACGGCTCCAGCCAGTGGATCTCCAGCGAGGCCGCACGCCTGGATCT 

GCATCGCCGCCGCGCCATCGCCGACGCGATCTTGGTCGGCACCGGCACCGTCCTCGCCGACG 

ACCCGGCCGTGACCGCGCGGCTGGCCGACGGCTCGCTGGCGCCGCAGCAGCCGCTGCGCGT 

GGTGGTGGGCAAGCGCGACATACCGCCGGAAGCACGGGTCGTCAACGACGAGGCACGCAGCA 

TGATGATGCGCACCCACGAACCTATGGAGGTGGTCAGGGCGTTGTCGGATCGCACCGACGTGC 

TGCTGGAAGGAGGTCCCACCCTCGCCGGCGCCTTCCTACGAGCGGGTGCGATCAACCGGATCC 

TGGCCTACGTCGCACCGATCCTGTTGGGCGGTCCGGTTACCGCGGTCGATGACGTCGGGGTGT 

CCAACATCACCAACGCGTTGCGTTGGCAGTTCGACAGCGTCGAAAAGGTCGGACCGGATCTGTT 

GCTGAGCTTGGTGGCTCGTTAG 
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>Rv1440 secGTB-seq 1617715:1618065 MW: 121 40 
>emb|AL123456|MTBH37RV:1617715-1618068. secG SEQ ID NO:54 

GTGGCAGGCGTGACAGCCGCGGTCAGTGCACGCCTCAAAGCCGATGAGGCGCGACGGCCTGG 
GTTCTACGCGGCAGGCAGCGGTCCGCTGCCGCAGGTTCGGGGGAGTACGCTACCCGTCATGG 
5 AATTGGCCCTGCAGATCACGCTGATCGTCACGAGCGTGCTGGTGGTGTTGTTAGTACTGCTGCA 
CCGGGCCAAGGGTGGCGGGCTATCGACACTGTTCGGCGGTGGTGTGCAGTCAAGCCTGTCCG 
GCTCGACGGTGGTGGAGAAGAACCTGGACCGGTTGACGCTGTTCGTTACCGGCATCTGGCTGG 
TGTCCATCATCGGCGTGGCGTTGCTCATCAAATACCGCTAG 

10 >Rv1484 inhATB.seq 1674200:1675006 MW:28529 

>emb|AL123456|MTBH37RV:1674200-1675009. inhA SEQ ID NO:55 

ATGACAGGACTGCTGGACGGCAAACGGATTCTGGTTAGCGGAATCATCACCGACTCGTCGATCG 

CGTTTCACATCGCACGGGTAGCCCAGGAGCAGGGCGCCCAGCTGGTGCTCACCGGGTTCGAC 

CGGCTGCGGCTGATTCAGCGCATCACCGACCGGCTGCCGGCAAAGGCCCCGCTGCTCGAACT 

15 CGACGTGCAAAACGAGGAGCACCTGGCCAGCTTGGCCGGCCGGGTGACCGAGGCGATCGGGG 
CGGGCAACAAGCTCGACGGGGTGGTGCATTCGATTGGGTTCATGCCGCAGACCGGGATGGGC 
ATCAACCCGTTCTTCGACGCGCCCTACGCGGATGTGTCCAAGGGCATCCACATCTCGGCGTATT 
CGTATGCTTCGATGGCCAAGGCGCTGCTGCCGATCATGAACCCCGGAGGTTCCATCGTCGGCA 
TGGACTTCGACCCGAGCCGGGCGATGCCGGCCTACAACTGGATGACGGTCGCCAAGAGCGCG 

20 TTGGAGTCGGTCAACAGGTTCGTGGCGCGCGAGGCCGGCAAGTACGGTGTGCGTTCGAATCTC 
GTTGCCGCAGGCCCTATCCGGACGCTGGCGATGAGTGCGATCGTCGGCGGTGCGCTCGGCGA 
GGAGGCCGGCGCCCAGATCCAGCTGCTCGAGGAGGGCTGGGATCAGCGCGCTCCGATCGGCT 
GGAACATGAAGGATGCGACGCCGGTCGCCAAGACGGTGTGCGCGCTGGTGTCTGACTGGCTG 
CCGGCGACCACGGGTGACATCATCTACGCCGACGGCGGCGCGCACACCCAATTGCTCTAG 

25 

>Rv1617 pykA pyruvate idnase TB.seq 1816187:1817602 MW:50668 

>emb|AL123456|MTBH37RV:1816187-1817605.pykA SEQ ID NO:56 

GTGACGAGACGCGGGAAAATCGTCTGCACTCTCGGGCCGGCCACCCAGCGGGACGACCTGGT 
CAGAGCGCTGGTCGAGGCCGGAATGGACGTCGCCCGAATGAACTTCAGCCACGGCGACTACGA 

30 CGATCACAAGGTCGCCTATGAGCGGGTCCGGGTAGCCTCCGACGCCACCGGGCGCGCGGTCG 
GCGTGCTCGCCGACCTGCAGGGCCCGAAGATCAGGTTGGGACGCTTCGCCTCCGGGGCCACC 
CACTGGGCCGAAGGCGAAACCGTCCGGATCACCGTGGGCGCCTGCGAGGGCAGCCACGATCG 
GGTGTCCACCACCTACAAGCGGCTAGCCCAGGACGCGGTGGCCGGTGACCGGGTGCTGGTCG 
ACGACGGCAAAGTCGCATTGGTGGTCGACGCCGTCGAGGGCGACGACGTGGTCTGCACCGTC 

35 GTCGAAGGCGGCCCGGTCAGCGACAACAAGGGCATGTCGTTGCCCGGAATGAACGTGACCGC 
GCCGGCCCTGTCGGAGAAGGACATCGAGGATCTCACGTTCGCGCTGAACCTCGGCGTCGACAT 
GGTGGCGCTTTCCTTCGTCCGCTCCCCGGCCGATGTCGAACTGGTCCACGAGGTGATGGATCG 
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GATCGGGCGACGGGTGCCGGTGATCGCCAAGCTGGAGAAGCCGGAAGCCATCGACAATCTCG 

AAGCGATCGTGCTGGCGTTCGACGCCGTCATGGTCGCTCGGGGCGACCTAGGTGTTGAGCTGC 

CGCTCGAAGAGGTCCCGCTGGTACAGAAGCGAGCCATCCAGATGGCCCGGGAGAACGCCAAG 

CCGGTCATTGTGGCGACCCAGATGCTCGACTCGATGATCGAGAACTCGCGGCCGACCCGAGCT 

GAGGCCTCCGACGTCGCCAACGCGGTGCTCGATGGCGCCGACGCGCTGATGCTGTCCGGGGA 

AACCTCGGTAGGGAAGTACCCCCTTGCTGCGGTCCGGACAATGTCGCGCATCATCTGCGCGGT 

CGAGGAGAACTCCACGGCCGCACCGCCGTTGACACACATTCCCCGGACCAAGCGTGGGGTCAT 

CTCGTATGCGGCCCGTGACATCGGCGAACGACTCGACGCCAAGGCCTTGGTGGCCTTCACTCA 

GTCCGGTGATACCGTGCGGCGACTGGCCCGCCTGCATACCCCGCTGCCGCTGCTGGCCTTCAC 

CGCGTGGCCCGAGGTGCGCAGCCAACTGGCGATGACCTGGGGCACCGAGACGTTCATCGTGC 

CGAAGATGCAGTCCACCGATGGCATGATCCGCCAGGTCGACAAATCGCTGCTCGAACTCGCCC 

GCTACAAGCGTGGTGACTTGGTGGTCATCGTCGCGGGTGCGCCGCCAGGCACAGTGGGTTCGA 

CCAACCTGATCCACGTGCACCGGATCGGGGAAGATGACGTCTAG 

>Rv1630 rpsA 30S ribosomal protein S1 TB.seq 1833540:1834982 MW:53203 
>emb|AL123456|MTBH37RV: 1833540-1 834985. rpsA SEQ ID NO:57 

ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCGAGGAC 

TTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCGAAGGCAGCA 

TCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCGAAGGCGTGATCC 

CCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTCGTTTCCGTCGGTGACG 

AGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGCTCATCCTCTCCAAGAAAC 

GCGCGCAGTACGAGCGTGCCTGGGGCACCATCGAGGCGCTCAAGGAGAAGGACGAGGCCGTC 

AAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCCTCGACATCGGGCTGCGCGGTTTC 

CTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCGACCTGCAGCCCTACATCGGCAAGGA 

GATCGAGGCCAAGATCATCGAGCTGGACAAGAACCGCAACAACGTGGTGCTGTCCCGTCGCGC 

CTGGCTGGAGCAGACCCAGTCCGAGGTGCGCAGCGAGTTCCTGAATAACTTGCAAAAAGGCAC 

CATCCGAAAGGGTGTCGTGTCCTCGATCGTCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGT 

GGACGGTCTGGTGCATGTCTCCGAGCTATGGTGGAAGCACATCGACCACCCGTCCGAGGTGGT 

CCAGGTTGGTGACGAGGTCACCGTCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTC 

GTTGTCACTCAAGGCGACTCAGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGG 

GCAGATCGTGCCGGGCAAGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGA 

GGGTATCGAGGGCCTGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATC 

AGGTGGTTGCCGTCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTC 

GGATCTCGTTGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGT 

ACGGCATGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCCGAGGGCTTCGATGCCG 

AAACCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCG 

AGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGGCG 
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GCTGGACGCGGCGCGGACGATCAGTCGTCGGCCAGTAGCGCACCGTCGGAAAAGACCGCGGG 
TGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCAGCGCTT 

GA 

>Rv1631 -TB.seq 1835011:1836231 MW:44669 

>emb|AL123456|MTBH37RV:1835011-1836234. Rv1631 SEQ ID NO:58 

ATGCTGCGCATCGGGCTGACCGGCGGCATTGGCGCCGGGAAGTCGTTGCTGTCCACGACGTTC 

TCGCAATGCGGCGGAATCGTTGTCGACGGCGATGTGTTGGCGCGTGAAGTGGTCCAGCCGGGC 

ACCGAGGGGCTGGCCTCGCTGGTCGACGCGTTCGGTCGCGACATCCTGCTTGCAGACGGAGC 

GCTGGACCGGCAGGCGTTGGCGGCCAAGGCGTTTCGAGATGACGAGTCGCGCGGTGTGCTCA 

ACGGAATCGTGCACCCGCTGGTCGCCCGGCGCCGATCCGAGATCATCGCGGCGGTTTCGGGG 

GACGCGGTTGTGGTCGAAGATATTCCACTGCTGGTGGAATCCGGGATGGCGCCATTGTTTCCGC 

TGGTGGTGGTGGTGCACGCCGACGTCGAGCTACGGGTGCGACGGCTGGTCX3AGCAACGCGGC 

ATGGCCGAAGCCGACGCCCGGGCTAGGATCGCTGCGCAGGCCAGCGACCAGCAGCGTCGTGC 

CGTCGCCGACGTCTGGCTGGACAACTCGGGCAGCXCAGAGGATTTGGTGCGGCGGGCCCGCG 

ACGTCTGGAACACGCGCGTCCAGCCCTTCGCGCACAACCTGGCCCAACGTCAGATTGCGCGCG 

CGCCGGCTAGGTTGGTGCCGGCGGATCCAAGCTGGCCGGATCAGGCGCGGCGCATCGTCAAC 

CGGCTAAAGATCGCGTGCGGGCATAAGGCCTTGCGAGTTGACCACATTGGGTCAACCGCCGTG 

TCGGGCTTCCCCGATTTTCTAGCCAAGGATGTCATCGACATCCAGGTCACCGTCGAATCACTTG 

ACGTGGCCGACGAGCTGGCCGAGCCCTTGCTGGCCGCCGGCTACCCACGCCTCGAGCACATC 

ACCCAGGACACCGAAAAGACCGACGCTCGCAGCACCGTCGGCCGCTACGACCACACCX3ACAGT 

GCCGCTCTGTGGCACAAGCGCGTGCACGCCTCGGCGGATCCCGGTCGGCCGACCAACGTGCA 

CCTGCGGGTGCACGGCTGGCCCAACCAACAGTTCGCCCTGCTGTTCGTCGACTGGCTGGCGGC 

CAATCGCGGCGCGAGAGAAGACTATTTGACGGTCAAGTGTGACGCCGACAGGCGCGCCGACG 

GTGAGCTCGCGCGCTACGTCACCGCCAAGGAGCCGTGGTTCCTGGATGCCTACCAGCGGGCAT 

GGGAGTGGGCGGATGCGGTGCACTGGCGTCCCTGA 

>Rv1706c - TB.seq 1932695:1933876 MW:39779 
>emb|AL123456|MTBH37RV:c1 933876-1 932692. PPE SEQ ID NO:59 

ATGACCCTCGATGTCCCGGTCAACCAGGGGCATGTCCCCCCGGGCAGCGTCGCCTGCTGCCTT 

GTTGGGGTCACCGCCGTTGCTGACGGCATCGCCGGGCATTCCCTGTCCAACTTTGGGGCGTTA 

CCTCCCGAGATCAATTCGGGTCGTATGTATAGCGGTCCGGGATCCGGGCCACTGATGGCTGCC 

GCGGCGGCCTGGGACGGGCTGGCCGCAGAGTTGTCGTCGGCAGCGACTGGCTACGGTGCGG 

CGATCTCGGAGCTGACAAACATGCGGTGGTGGTGGGGGCCGGCATCGGATTCGATGGTGGCC 

GCCGTCCTGCCCTTTGTCGGCTGGCTGAGTACCACCGCGACGCTAGCCGAACAG6CCGCGATG 

CAGGCTAGGGCGGCCGCAGCGGCCTTTGAAGCCGCCTTCGCCATGACGGTGCCCCCGCCGGC 

GATCGCGGCCAACCGGACCTTGTTGATGACGCTCGTCGATACCAACTGGTTCGGGCAAAACAC 
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GCCGGCGATCGCCACCACCGAGTCCCAATACGCCGAGATGTGGGCCCAAGACGCCGCCGCGA 

TGTACGGCTATGCCAGCGCCGCGGCACCCGCCACGGITTTGACTCCGTrCGCACCACCGCCGC 

AAACCACCAACGCGACCGGCCTCGTCGGCCACGCAACAGCGGTGGCCGCGCTGCGGGGGCAG 

CACAGCTGGGCCGCGGCGATTCCATGGAGCGACATACAGAAATACTGGATGATGTTCCTGGGC 

GCCCTCGCCACTGCCGAAGGGTTCATTTACGACAGCGGTGGGTTAACGCTGAATGCTCTGCAGT 

TCGTCGGCGGGATGTTGTGGAGCACCGCATTGGCAGAAGCCGGTGCGGCCGAGGCAGCGGCC 

GGCGCGGGTGGAGCCGCTGGATGGTCGGCGTGGTCGCAGCTGGGAGCTGGACCGGTGGCGG 

CGAGCGCGACTCTGGCCGCCAAGATCGGACCGATGTCGGTGCCGCCGGGCTGGTCCGCACCG 

CCCGCCACGCCCCAGGCGCAAACCGTCGCGCGATCGATTCCCGGTATTCGCAGCGCCGCCGA 

GGCGGCTGAAACATCGGTCCTACTCCGGGGGGCACCGACTCCGGGCAGGAGTCGCGCCGCCC 

ATATGGGACGCCGATATGGAAGACGACTCACCGTGATGGCTGACCGGCCGAACGTCGGATAG 

>Rv1745c - similar to Q46822 ORF_0182 TB.seq 1971381:1971989 MW:22490 
>emb|AL123456|MTBH37RV:c1971989-1971378. Rv1745c SEQ ID NO:60 

ATGACCCGCAGCTACCGGCCAGCTCCACCGATCGAGCGGGTGGTTTTGCTCAACGACCGCGGC 

GACGCGACAGGTGTGGCCGACAAGGCCACCGTGCACACCGGCGACACCCCTTTGCACCTCGC 

GTTCTCCAGCTATGTGTTCGATCTGCACGATCAGCTGTTGATCACGCGGCGGGCCGCCACCAAG 

AGGACGTGGCCGGCGGTATGGACCAACAGTTGCTGCGGGCACCCCCTGCCTGGCGAATCGCT 

ACCCGGCGCCATACGCCGGCGGCTCGCTGCCGAACTCGGACTGACCCCAGATCGGGTCGATC 

TGATCCTGCCGGGGTTCCGCTACCGGGCCGCTATGGCCGATGGCACCGTGGAAAACGAGATCT 

GCCCCGTCTACCGAGTCCAGGTTGACCAACAGCCCCGGCCGAACTCGGACGAGGTCGACGCG 

ATCCGCTGGTTGTCCTGGGAACAATTCGTGCGCGATGTTACCGCCGGCGTAATCGCCCCGGTAT 

CCCCTTGGTGCCGCTCACAACTGGGCTACCTGACCAAACTTGGACCATGTCCGGCACAGTGGC 

CCGTGGCCGACGACTGCCGGCTACCGAAAGCCGCACATGGTAATTAA 

>Rv1800 -TB-seq 2039451:2041415 MW:67068 
>emb|AL123456|MTBH37RV:2039451-2041418. PPE SEQ ID NO:61 

ATGCTGCCGAATTTCGCGGTGCTGCCCCCCGAGGTCAATTCGGCGAGGGTGTTCGCCGGTGCG 

GGGTCGGCGCCGATGTTAGCGGCAGCGGCCGCCTGGGATGATCTAGCCTCCGAGCTGCATTGT 

GCTGCAATGTCATTCGGGTCGGTTACGTCGGGATTGGTGGTTGGGTGGTGGCAGGGATCGGCG 

TCGGCGGCGATGGTGGACGCAGCCGCGTCGTACATCGGGTGGCTGAGCACGTCGGCTGCCCA 

CGCCGAGGGCGCGGCCGGTCTGGCTCGGGCCGCGGTATCGGTGTTCGAGGAGGCGCTGGCC 

GCGACGGTGCATCCGGCGATGGTTGCGGCAAATCGCGCCCAGGTGGGGTCGCTGGTAGCGTC 

GAACTTGTTTGGGCAGAACGCGCCTGCGATCGCCGCGCTCGAATCCTTGTATGAGTGTATGTGG 

GCCCAGGATGCAGCGGCCATGGCGGGTTATTACGTTGGGGCTTCGGCGGTGGCCACACAGTTG 

GCATCGTGGCTGCAACGGCTACAGAGCATCCCCGGCGCCGCCAGTCTTGATGCCCGTCTGCCG 

AGCTCGGCCGAGGCACCGATGGGAGTCGTCCGCGCGGTCAACAGCGCGATCGCCGCCAATGC 
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GGCTGCGGCACAAACCGTTGGCCTGGTCATGGGAGGCAGCGGCACGCCAATACCGTCGGCCA 

GATATGTCGAGCTCGCGAACGCGCTGTACATGAGTGGCAGCGTCCCGGGTGTTATCGCGCAGG 

CGCTCTTCACGCCCCAAGGGCTCTACCCGGTGGTCGTGATCAAGAACCTCACTTTCGATTCCTC 

GGTGGCGCAGGGTGCCGTCATTCTCGAAAGTGCGATTCGGCAGCAAATTGCCGCCGGCAACAA 

CGTCACCGTCTTCGGCTACTCGCAGAGCGCCACGATCTCGTCACTAGTGATGGCCAATCTTGCG 

GCTTCGGCCGACCCGCCGTCTCCAGACGAGCTTTCCTTCACGCTGATCGGCAATCCCAACAACC 

CCAATGGCGGGGTTGCCACCAGGTTCCCGGGGATCTCCTTTCCAAGCTTGGGCGTGACGGCCA 

CCGGGGCCACTCCGCACAATCTGTACCCGACCAAGATCTACACCATCGAATACGACGGCGTCG 

CCGACTTTCCGCGGTACCCGCTCAACTTTGTGTCGACCCTCAACGCCATTGCCGGCACCTACTA 

CGTGCACTCCAACTACTTCATCCTGACGCCGGAACAAATTGACGCAGCGGTTCCGCTGACCAAT 

ACGGTCGGTCCCACGATGACCCAGTACTACATCATTCGCACGGAGAACCTGCCGCTGCTAGAG 

CCACTGCGATCGGTGCCGATCGTGGGGAACCCACTGGCGAACCTGGTTCAACCAAACTTGAAG 

GTGATTGTTAACCTGGGCTACGGCGACCCGGCCTATGGTTATTCGACCTCGCCGCCCAATGTTG 

CGACTCCGTTCGGGTTGTTCCCAGAGGTCAGCCCGGTCGTCATCGCCGACGCTCTCGTCGCCG 

GGACCCAGCAGGGAATCGGCGATTTCGCCTACGACGTCAGCCACCTCGAACTGCCGTTGCCGG 

CAGACGGGTCGACGATGCCAAGCACCGCACCGGGCTCGGGTACGCCGGTCCCCCCGCTCTCG 

ATCGACAGCCTGATAGACGACCTGCAGGTGGCTAACCGCAACCTCGCCAACACGATTTCGAAG 

GTGGCCGCGACGAGCTACGCGACGGTGCTCCCAACCGCCGACATCGCCAATGCGGCGTTGAC 

GATCGTGCCGTCGTACAACATCCACCTTTTTTTGGAGGGCATCCAGCAAGCGCTCAAGGGCGAC 

CCGATGGGACTCGTCAACGCGGTCGGATACCCACTCGCGGCCGACGTGGCACTGTTCACGGCC 

GCAGGCGGTCTTCAGCTCTTGATCATCATCAGCGCGGGCCGAACGATTGCCAATGACATCTCGG 

CCATTGTCCCCTGA 

>Rv1 844c gnd 6-phosphogluconate dehydrogenase (Gram -) TB.seq 2093732:20951 86 

MW:51548 >emb|AL123456|MTBH37RV:c20951 86-2093729. gnd SEQ ID NO:62 

ATGAGTTCGTCGGAATCGCCAGCCGGCATCGCGCAGATCGGCGTCACTGGCCTGGCCGTGATG 

GGTTCCAACATCGCCCGAAACTTCGCCCGGCACGGCTACACCGTGGCAGTGCACAATCGGTCG 

GTCGCCAAGACCGACGCGCTGCTTAAGGAGCACAGCTCAGACGGCAAGTTCGTGCGCAGTGAA 

ACGATCCCCGAATTTCTTGCCGCACTGGAAAAACCGCGTCGGGTGCTGATCATGGTCAAGGCC 

GGAGAGGCCACTGACGCTGACGCTGTCATCAACGAACTTGCTGACGCCATGGAACCCGGCGAC 

ATCATCATCGACGGCGGCAATGCGTTGTACACCGACACGATGCGCCGCGAGAAAGCGATGCGT 

GAGCGGGGCTTGCACTTCGTCGGGGCCGGGATCTCCGGCGGCGAAGAGGGCGCGTTGAACGG 

GCCGTCGATCATGCCCGGCGGACCCGCCGAGTCATACCAATCGCTGGGTCCGCTGCTCGAGGA 

GATCTCCGCGCATGTCGACGGCGTGCCGTGCTGCACCCACATTGGCCCGGACGGCTCCGGGC 

ACTTCGTCAAGATGGTCCACAACGGCATCGAGTACTCCGACATGCAGCTCATCGGTGAGGCCTA 

CCAGCTGATGCGCGACGGGCTAGGTCTGACCGCGCCGGCGATCGCCGATGTGTTCACCGAGT 

GGAACAATGGCGATCTGGACAGCTACCTGGTCGAGATCACCGCCGAGGTGCTGCGGCAGACCG 
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ATGCCAAGACCGGCAAACCGCTCGTCGACGTCATCGTGGACCGGGCCGAGCAGAAAGGCACC 

GGCCGTTGGACCGTCAAGTCCGCGCTGGACCTGGGTGTGCCGGTGACCGGCATCGCCGAAGC 

GGTGTTTGCCCGCGCTCTCTCGGGATCCGTGGGGCAACGCTCGGCCGCCAGCGGTCTGGCTTC 

GGGCAAGCTCGGCGAGCAGCCCGCCGACCCCGCCACGTTCACCGAAGACGTCCGCCAGGCGT 

TGTACGCCTCCAAGATCGTGGCCTACGCTCAGGGCTTCAACCAGATCCAGGCCGGCAGCGCCG 

AATTCGGCTGGGACATCACGCCGGGCGACCTGGCCACCATCTGGCGTGGCGGCTGCATCATCC 

GGGCGAAGTTCCTCAACCACATCAAGGAAGCCTTTGACGCCAGCCCGAACCTGGCCAGTCTGA 

TTGTGGCCCCGTATTTCCGCGGCGCCGTCGAATCGGCGATCGACAGTTGGCGGCGTGTGGTGT 

CGACGGCGGCCCAACTGGGTATCCCGACCCCGGGATTCTCGTCGGCCCTGTCGTATTACGACG 

CGCTGCGCACCGCGCGGCTGCCCGCTGCACTCACCCAGGCCCAGCGCGACTTCTTCGGCGCA 

CACACCTACGGCCGGATCGACGAACCAGGCAAGTTCCACACACTATGGAGTTCAGACCGCACC 

GAAGTACCGGTGTAG 

>Rv1900c lipJ TB.seq 2146246:2147631 MW:49685 
>emb|AL123456|MTBH37RV:c2147631 -2146243. lipJ SEQ ID NO:63 

GTGGCGCAGGCTCCCCACATTCACAGGACCCGCTACGCAAAATGCGGCGACATGGATATCGCC 

TACCAGGTGCTGGGTGACGGTCCGACGGATCTGCTGGTGTTGCCGGGGCCGTTCGTGCCGATC 

GACTCGATCGACGACGAGCCATCGCTGTACCGTTTCCATCGCCGTCTTGCGTCATTCAGCAGGG 

TGATCCGCCTCGACCATCGTGGGGTCGGCCTGTCGTCACGGCtCGCCGCGATAACCACGCTGG 

GGCCGAAGTTCTGGGCCCAGGACGCGATCGCGGTGATGGACGCGGTCGGATGCGAGCAGGCG 

ACAATTTTCGCGCCCAGTTTCCACGCCATGAACGGACTTGTTCTCGCCGCCGACTACCCCGAGC 

GGGTGCGCAGCCTGATCGTCGTCAACGGCTCGGCGCGCCCACTATGGGCGCCCGACTACCCG 

GTAGGCGCCCAGGTTCGTCGAGCTGACCCGTTCCTGACGGTGGCGCTGGAACCGGATGCCGTC 

GAGCGGGGCTTCGACGTGCTGAGCATCGTGGCTCCTACCGTGGCCGGAGATGACGTGTTTCGA 

GCCTGGTGGGATCTCGCCGGCAACCGTGCCGGACCGCCGAGCATTGCCCGTGCCGTTTCAAAG 

GTCATAGCCGAGGCCGACGTACGAGATGTCTTGGGACACATCGAGGCTCCAACACTGATCTTGC 

ACCGTGTCGGATGGACGTACATCCCGGTGGGACATGGTCGCTACCTCGCCGAGCACATCGCTG 

GATCCCGCTTGGTCGAACTACCCGGCACCGATACCCTGTACTGGGTTGGCGACACCGGGCCGA 

TGCTCGATGAAATCGAGGAATTCATCACCGGCGTGCGCGGCGGCGCTGACGCCGAGCGCATGC 

TTGCCACCATCATGTTTACCGACATCGTCGGCTCGACCCAGCACGCCGCCGCGCTCGGCGACG 

ACCGATGGCGCGACCTGTTGGACAACCACGACACCATCGTGTGCCACGAAATCCAGCGGTTCG 

GCGGTCGCGAAGTGAACACGGCCGGTGACGGTTTCGTCGCGACGTTCACCAGTCCGAGTGCC 

GCGATCGCGTGCGCGGACGACATCGTCGACGCGGTCGCCGCGCTGGGTATTGAGGTCCGGAT 

CGGTATTCATGCGGGCGAGGTCGAGGTGCGCGATGCCTCGCACGGTACCGACGTCGCCGGCG 

TGGCCGTGCATATCGGTGCGCGCGTCTGCGCGCTGGCCGGACCCAGTGAGGTGCTGGTGTCC 

TCGACCGTGCGAGACATCGTCGCCGGATCACGGCACCGGTTGGCCGAGCGTGGTGAGCAGGA 
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ACTCAAGGGCGTACCGGGCAGATGGCGGCTATGCGTGCTCATGCGCGACGACGCCACCCGCA 
CGCGCTAA 

>Rv1967 - TB.seq 2210599:2211624 MW:36516 

>emb|AL123456|MTBH37RV:22l0599-2211627. Rv1967 SEQ ID NO:64 

ATGAGGGAGAACCTGGGGGGCGTCGTGGTGCGCCTCGGCGTCTTCCTGGCGGTATGCCTGCT 

GACGGCGTTCCTGCTGATTGCCGTCTTCGGGGAGGTGCGCTTCGGCGACGGCAAGACCTACTA 

CGCCGAGTTCGCCAACGTGTCCAATCTGCGAACGGGCAAGCTGGTGCGCATCGCCGGCGTCGA 

GGTCGGCAAGGTCACCAGGATCTCCATCAACCCCGACGCGACGGTGCGGGTGCAGTTCACCGC 

CGACAACTCGGTCACCCTCACGCGGGGCACCCGGGCGGTGATCCGCTACGACAACCTGTTCGG 

TGACCGCTATTTGGCGCTGGAGGAAGGGGCCGGCGGACTCGCCGTTCTTCGTCCCGGTCACAC 

GATTCCGTTGGCGCGCACCCAACCGGCGTTGGATCTGGATGCCCTGATCGGTGGATTCAAGCC 

GCTGTTTCGTGCGCTGAACCCCGAGCAGGTCAACGCGCTGAGCGAACAGTTGCTGCACGCGTT 

TGCCGGACAGGGGCCCACGATCGGGTCATTGCTGGCCCAGTCCGCGGCCGTGACCAACACCC 

TGGCCGACCGTGATCGGCTGATCGGGCAGGTGATCACCAACCTCAACGTGGTGCTGGGCTCGC 

TGGGCGCTCACACCGATCGGTTGGACCAGGCGGTGACGTGGCTATCAGCGTTGATTCACCGGC 

TCGCGCAACGCAAGACCGACATCTCCAACGCCGTGGCCTACACCAACGCCGCCGCCGGCTCG 

GTCGCCGATCTGCTGTCGCAGGCTCGCGCGCCGTTGGCGAAGGTGGTTCGCGAGACCGATCG 

GGTGGCCGGCATCGCGGCCGCCGACGACGACTACCTCGACAATCTGCTCAACACGCTGCCGGA 

CAAATACCAGGCGCTGGTCCGCCAGGGTATGTACGGCGACTTCTTCGCCTTCTACCTGTGCGAC 

GTCGTGCTCAAGGTCAACGGCAAGGGCGGCCAGCCGGTGTACATCAAGCTGGCCGGTCAGGA 

CAGCGGGCGGTGCGCGCCGAAATGA 

>Rv1975 - TB.seq 2218050:2218712 MW:23650 

>emb|AL123456|MTBH37RV:221 8050-221 871 5. Rv1975 SEQ ID NO:65 

ATGTCGCGTCGAGCATCGGCCACGTGTGCCTTGTCCGCGACCACCGCCGTCGCCATAATGGCT 

GCTCCCGCCGCACGGGCCGACGACAAGCGGCTCAACGACGGCGTGGTCGCCAACGTCTACAC 

CGTTCAACGTCAGGCCGGCTGCACCAACGACGTCACGATCAACCCGCAACTACAATTGGCCGC 

CCAATGGCACACCCTCGATCTGCTGAAGAACCGGCACCTCAACGACGACACCGGTTCTGACGG 

ATCCACACCGCAAGACCGCGCGCATGCCGCCGGCTTCCGCGGGAAAGTCGCTGAAACCGTGG 

CGATCAATCCCGCCGTAGCGATCAGCGGCATCGAGTTGATAAACCAGTGGTACTACAACCCCGC 

GTTTTTCGCGATCATGTCCGACTGCGCCAACACCCAGATCGGGGTGTGGTCAGAAAACAGCCC 

GGATCGCACCGTCGTGGTGGCCGTTTACGGACAGCCCGATCGACCTTCCGCGATGCCGCCCAG 

GGGAGCGGTAACCGGACCGCCGTCCCCGGTGGCCGCGCAAGAGAACGTTCCTATCGACCCCA 

GCCCCGACTACGACGCCAGCGACGAGATCGAATACGGCATCAACTGGCTGCCATGGATCCTGC 

GCGGCGTGTACCCGCCGCCCGCAATGCCGCCGCAGTAG 
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>Rv1981c nrdF ribonucleotide reductase small subunit TB.seq 2224221:2225186 MW:36591 
>emb|AL123456|MTBH37RV:c22251 86-222421 8. nrdF SEQ ID NO:66 

ATGACCGGCAAGCTCGTTGAGCGGGTGCACGCAATCAATTGGAACCGGTTGCTCGATGCTAAA 
GATTTGCAGGTCTGGGAACGTTTGACCGGTAACTTTTGGTTGCCGGAAAAGATTCCGCTCTCCA 

5 ACGACCTGGCATCTTGGCAAACGTTGAGTTCCACCGAGCAGCAGACGACGATCCGGGTGTTCA 
CCGGCTTGACCCTGCTCGACAGCGCGCAGGCGACGGTGGGAGCAGTGGCCATGATCGACGAC 
GCGGTCACCCCCCACGAAGAGGCX3GTCCTGACCAACATGGCGTTCATGGAGTCAGTGCACGCC 
AAGAGCTACAGCTCGATCTTCTCGACCCTGTGCTCGACCAAGCAGATCGACGATGCCTTCGACT 
GGTCGGAACAGAACCCTTACCTGCAGCGAAAAGCGCAGATCATCGTCGACTACTACCGCGGTG 

1 0 ACQ ACGCGCTCAAGCGCAAAGCATCGTCGGTAATGCTGGAGTCCTTCCTGTTCTACTCCGGCTT 
CTACCTGCCCATGTACTGGTCGTCGCGGGGTAAGCTCACCAACACCGCCGATCTGATCCGGCT 
GATCATCCGAGATGAAGCCGTCCACGGCTACTACATCGGCTACAAATGTCAACGAGGTTTGGGC 
GACCTGACCGACGCCGAGCGGGCCGACCACCGCGAATACACCTGCGAGCTGCTGCACACGCT 
CTACGCGAACGAGATCGACTATGCGCACGACTTGTACGACGAGTTGGGCTGGACCGACGACGT 

1 5 TTTGCCCTACATGCGTTACAACGCCAACAAGGCGCTAGCCAACCTGGGATACCAGCCTGCATTC 
GATCGTGACACCTGCCAGGTGAACCCGGCCGTGCGCGCAGCTCTCGACCCCGGTGCAGGGGA 
GAACCACGACTnTTCTCCGGCTCCGGAAGCTCATACGTAATGGGCACCCACCAACCCACCACC 
GACACCGACTGGGACTTCTAA 

20 >Rv2092c helY helicase. Ski2 subfamily TB.seq 2349335:2352052 MW:99576 
>emb|AL123456|^4TBH37RV:c2352052-2349332. helY SEQ ID NO:67 

GTGACTGAGCTGGCCGAGCTGGACCGGTTCACCGCGGAACTACCGTTCTCGCTCGACGACTTT 
CAGCAGCGGGCTTGCAGCGCGCTGGAACGCGGCCACGGTGTGCTGGTGTGCGCGCCGACCG 
GCGCTGGCAAGACGGTGGTCGGCGAGTTCGCCGTGCACCTGGCGCTGGCGGCCGGCAGTAAA 

25 TGTTTCTACACCACGCCGCTGAAAGCCCTGAGCAACCAAAAGCACACCGATCTCACAGCACGCT 
ACGGCCGTGACCAGATCGGGCTGCTGACCGGTGACCTGTCGGTCAACGGCAACGCGCCGGTG 
GTGGTGATGACCACCGAAGTGCTGCGCAACATGCTCTACGCGGATTCGCCTGCGCTGCAGGGG 
CTTTCCTATGTGGTGATGGATGAGGTGCATTTCCTCGCCGACCGGATGCGGGGTCCGGTGTGG 
GAGGAGGTGATCCTGCAACTGCCCGACGACGTGCGGGTGGTCAGCCTGTCGGCGACGGTGAG 

30 CAACGCCGAGGAGTTCGGCGGTTGGATCCAGACGGTGCGGGGCGACACCACGGTGGTGGTCG 
ACGAGCATCGGCCGGTGCCGTTGTGGCAACACGTCTTGGTGGGCAAGCGCATGTTCGACCTGT 
TCGATTACXJGGATCGGCGAAGCCGAAGGGCAGCCCCAAGTCAACCGCGAGTTGCTGCGCCACA 

tcgcgcatcgccgtgaggccgaccggatggccgattggcagcctcggcgccgaggctcgggc 
cggcccggcttctaccggccacccggccgacccgaggtgatcgccaaactcgacgctgaagg 
35 gctgttgccggcgatcaccttcgtgttctcccgggcx:ggttgtgacgccgcggtcacccaatg 
cctgcggtcaccgctgcggttgaccagcgaagaggagcgcgcacggatcgccgaggtgatcg 
accaccgctgcggtgacctggccgactccgacctggcggtactcggctactacgaatggcgg 
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GAAGGGTTACTGCGCGGTCTGGCCGCCCACCACGCGGGCATGTTGCCGGCCTTCCGGCACAC 

GGTGGAGGAGCTGTTCACCGCCGGTTTGGTCAAGGCTGTATTCGCCACCGAGACTCTGGCGCT 

CGGTATCAACATGCCGGGCCGCACGGTGGTGCTGGAGCGGCTGGTGAAGTTCAACGGTGAGCA 

GCACATGCCGCTGACGCCGGGGGAGTACACCCAACTGACCGGTCGCGCCGGCCGGCGCGGTA 

TCGACGTCGAGGGTCACGCGGTGGTGATCTGGCACCCGGAAATTGAACCGTCCGAGGTGGCG 

GGCCTGGCCTCCACCCGCACCTTTCCGCTGCGCAGCTCGTTTGCCCCGTCGTACAACATGACG 

ATCAACCTGGTGCACCGGATGGGTCCGCAACAGGCGCACCGACTGCTCGAGCAGTCGTTCGCC 

CAATATCAGGCCGACCGATCCGTGGTCGGACTGGTCCGCGGAATTGAGCGGGGCAACAGGATA 

CTCGGCGAGATCGCAGCCGAACTGGGCGGATCTGATGCGCCCATCCTCGAATACGCTCGATTG 

CGCGCGCGGGTGTCXGAGCTGGAACGTGCGCAGGCCCGCGCGTCGCGGTTACAGCGACGGC 

AGGCGGCCACCGATGCGCTGGCCGCGCTGCGCCGCGGTGACATCATCACCATCACCCACGGC 

CGCCGCGGTGGTCTGGCCGTCGTCCTGGAATCAGCCCGCGACCGCGACGACCCGCGTCCGCT 

GGTGCTAACCGAACACCGATGGGCGGGACGGATCTCCTCGGCCGACTACTCGGGCACGACGC 

CGGTGGGGTCGATGACGCTGCCCAAGCGGGTGGAGCACCGCCAGCCGCGGGTCCGGCGTGA 

CCTGGCCTCGGCGCTGCGATCGGGAGCCGCGGGTCTGGTTATTCCAGCCGCCCGGCGCGTCA 

GCGAGGCCGGCGGGTTTCACGATCCGGAGCTGGAGTCGTCGCGCGAACAATTGCGCCGTCAT 

CCGGTGCATACCTCGCCCGGGCTCGAGGACCAGATCCGCCAGGCCGAGCGTTACTTACGCATC 

GAACGCGACAACGCGCAATTAGAGAGGAAGGTCGCCGCCGCCACCAACTCGTTGGCCCGCAC 

GTTCGACCGATTCGTCGGGCTGCTCACCGAACGGGAGTTCATCGATGGCCCGGCCACTGATCC 

CGTGGTCACCGACGACGGCCGGCTGCTGGCGCGGATTTACAGCGAGAGCGACCTGTTGGTGG 

CCGAGTGCCTACGTACAGGTGCGTGGGAGGGTTTAAAGCCGGCCGAATTGGCGGGGGTGGTG 

TCGGCGGTGGTCTACGAGACGCGCGGTGGTGACGGCCAGGGCGCCCCGTTCGGAGCCGATGT 

GCCCACACCGCGGTTACGGCAGGCTCTGACTCAGACATCAAGGCTGTCCACGACATTGCGCGC 

CGACGAGCAGGCACACCGCATCACCCCGAGTCGCGAACCCGACGATGGCTTTGTCAGAGTCAT 

CTACCGCTGGTCGCGAACCGGTGATCTAGCGGCGGCATTGGCCGCTGCCGACGTGAACGGCA 

GCGGATCACCGTTATTGGCAGGGGATTTCGTGCGTTGGTGCCGTCAGGTGCTCGATCTGCTGG 

ACCAAGTTCGTAACGCTGCGCCCAACCCCGAACTGCGGGCTACCGCAAAGCGCGCTATCGGTG 

ACATTCGGCGCGGCGTCGTCGCGGTTGACGCCGGGTAG 

>Rv2101 helZ helicase. Snf2«ad54 family TB.seq 2360238:2363276 MW:1 11632 
>emb|AL123456|MTBH37RV:2360238-2363279. helZ SEQ ID NO:68 

ATGGTGGTTTTGCACGGCTTCTGGTCCAACTCCGGCGGGATGCGGCTGTGGGCGGAGGACTCC 

GATCTGCTGGTGAAGAGCCCGAGTCAGGCGCTGCGCTCCGCGCGGCCACACCCGTTCGCGGC 

GCCCGCTGACCTGATCGCCGGCATACATCCGGGCAAACCCGCAACCGCCGTTTTGCTGTTGCC 

GTCGTTGCGATCGGCGCCGCTGGACTCGCCGGAGCTGATCCGGCTCGCCCCGCGCCCGGCCG 

CGCGAACCGATCCGATGCTGTTGGCGTGGACGGTACCGGTGGTGGACCTGGACCCCACCGCG 

GCGTTGGCCGCCTTCGACCAGCCCGCCCCCGACGTCCGCTACGGCGCGTCCGTCGACTACCT 
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GGCCGAGCTGGCCGTTTTCGCGCGCGAGTTGGTCGAGCGTGGTCGCGTGCTGCCCCAGCTGC 
GCCGCGACACCCACGGCGCGGCCGCCTGCTGGCGTCCGGTGTTGCAGGGACGCGACGTGGTC 
GCGATGACCTCGCTGGTCTCGGCGATGCCGCCGGTCTGCCGCGCCGAAGTTGGTGGGCACGA 
CCCGCACGAACTGGCAACCTCGGCTCTGGACGCGATGGTCGACGCCGCCGTGCGCGCGGCGC 

5 TGTCACCGATGGACCTGCTGCCCCCGCGACGGGGTCGCTCCAAACGGCATCGGGCCGTGGAG 
GCTTGGCTGACCGCGTTGACCTGCCCGGACGGCCGGTTCGACGCGGAGCCCGACGAACTCGA 
CGCGCTGGCCGAGGCGTTGCGGCCATGGGACGACGTCGGTATCGGCACCGTCGGCCCGGCGC 
GGGCGACGTTTCGGCTGTGCGAAGTCGAGACCGAAAACGAGGAGACGCCCGCGGGCTCGTTG 
TGGAGGCTGGAGTTCTTATTGCAGTCGACGCAGGACCCCAGCCTGCTGGTCCCCGCCGAGCAG 

10 GCATGGAACGACGACGGCAGCCTGCGCCGCTGGCTGGACGGGCCGCAGGAGCTGCTGCTGAC 
CGAACTGGGCCGGGCCTCTCGGATnTCCCCGAGCTCGTCCCGGCGCTGCGCACCGCGTGCC 
CGTCCGGGCTTGAGCTCGACGCCGACGGCGCCTACCGATTCCTGTCGGGTACGGCCGCGGTG 
CTCGACGAGGCTGGGTTTGGCGTGCTGCTGCCGTCCTGGTGGGACCGCCGCCGCAAGCTGGG 
CTTGGTCCTGTCCGCATATACCCCGGTCGACGGCGTGGTGGGCAAGGCCAGCAAGTTCGGCCG 

15 CGAGCAGCTCGTCGAGTTCCGCTGGGAGCTGGCCGTGGGCGACGATCCGCTCAGCGAGGAGG 
AGATCGCGGCGCTGACCGAAACCAAGTCCCCGCTGATCCGGGTGCGTGGCCAGTGGGTCGCG 
CTCGATACCGAACAGATGCGCCGCGGGCTGGAGTTTTTGGAGCGTAAGCCAACCGGCCGCAAG 
ACCACCGCCGAGATCCTCGCGCTGGCCGGCAGCCACCCCGACGACGTGGACACCCCGCTCGA 
GGTCACCGCCGTACGCGCCGACGGGTGGCTCGGGGACCTGCTCGCCGGGGCCGCCGCGGCG 

20 TCGCTGCAGCCGTTGGACCCGCCCGACGGATTCACCGCGACGCTGGGTCCCTACCAGCAGCGC 
GGTCTGGCGTGGCTGGCGTTTTrGTCCTCGCTCGGTTTGGGCAGCTGCCTGGCCGACGACATG 
GGCCTGGGCAAGACGGTGCAGCTATTGGCCCTGGAAACCTTGGAATCCGTTGAGCGCCACCAG 
GATCGCGGCGTCGGACCCACACTGCTACTGTGCCCGATGTCGTTGGTGGGCAACTGGCCGCAG 
GAAGCGGCCAGGTTTGCACCCAACCTGCGGGTGTACGCCCACCACGGGGGCGCCCGGCTGCA 

25 CGGCGAGGCGTTGCGCGACCACGTCGAGCGCACCGACCTGGTCGTGAGCACCTATACCACCG 
CCACCCGCGACATCGACGAGCTGGCGGAATACGAATGGAACCGGGTGGTGCTGGACGAGGCC 
CAGGCGGTGAAGAACAGCCTGTCCCGGGCGGCCAAGGCGGTGCGACGGCTACGCGCGGCGC 
ACCGGGTCGCGCTGACCGGGACACCGATGGAGAACCGGCTCGCCGAGCTGTGGTCGATCATG 
GACTTCCTCAACCCGGGCCTGCTCGGATCCTCCGAACGCTTCCGCACCCGCTACGCGATCCCG 

30 ATCGAGCGGCACGGGCACACCGAACCGGCCGAACGGCTGCGCGCATCGACGCGGCCCTACAT 
CCTGCGCCGGCTCAAGACCGACCCGGCGATCATCGACGATCTGGCGGAGAAGATCGAGATCAA 
GCAGTACTGCCAACTCACCACCGAGCAGGCGTCGCTGTATCAGGCCGTCGTCGCCGACATGAT 
GGAAAAGATCGAAAACACCGAAGGGATCGAGCGGCGCGGCAACGTGCTGGCCGCGATGGCCA 
AGCTCAAACAGGTGTGCAACCACCGCGCCCAGCTGCTGCACGATCGCTCCCCGGTCGGTCGGC 

35 GGTCCGGGAAGGTGATCCGGCTCGAGGAGATCCTGGAAGAGATCCTGGCCGAGGGGGACCGG 
GTGCTGTGTTTTACCCAGTTCACCGAGTTCGCCGAGCTGCTGGTGCCGCACCTGGCCGCACGC 
TTCGGCCGTGCCGCCCGAGACATTGCCTACCTGCACGGTGGCACCCCGAGGAAGCGGCGTGA 
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CGAGATGGTGGCCCGGTTCCAGTCCGGTGACGGCCCGCCCATTTTTCTGCTGTCGTTGAAGGC 

GGGCGGTACCGGGCTGAACCTCACCGCCGCCAATCATGTTGTGCACCTGGACCGCTGGTGGAA 

CCCGGCGGTCGAGAACCAGGCGACGGACCGGGCGTTTCGGATCGGGCAGCGGCGCACGGTG 

CAGGTCCGCAAGTTCATCTGCACCGGCACCCTCGAGGAGAAGATCGACGAAATGATCGAGGAG 

AAAAAGGCGCTGGCCGACTTGGTGGTCACCGACGGCGAAGGCTGGCTGACCGAACTGTCCACC 

CGCGATCTGCGCGAGGTGTTCGCGCTGTCCGAAGGCGCCGTCGGTGAGTAG 

>Rv21 1 0c prcB proteasome [beta]-typ8 subunit 2 TB.seq 2369727:2370599 MW:30274 
>emb|AL123456|MTBH37RV:c2370599-2369724. prcB SEQ ID NO:69 

GTGACCTGGCCGTTGCCCGATCGCCTGTCCATTAATTCACTCTCTGGAACACCCGCTGTAGACC 

TATCTTCTTTCACTGACTTCCTGCGCCGCCAGGCGCCGGAGTTGCTGCCGGCAAGCATCAGCG 

GCGGTGCGCCACTCGCAGGCGGCGATGCGCAACTGCCGCACGGCACCACCATTGTCGCGCTG 

AAATACCCCGGCGGTGTTGTCATGGCGGGTGACCGGCGTTCGACGCAGGGCAACATGATTTCT 

GGGCGTGATGTGCGCAAGGTGTATATCACCGATGACTACACCGCTACCGGCATCGCTGGCACG 

GCTGCGGTCGCGGTTGAGTTTGCCCGGCTGTATGCCGTGGAACTTGAGCACTACGAGAAGCTC 

GAGGGTGTGCCGCTGACGTTTGCCGGCAAAATCAACCGGCTGGCGATTATGGTGCGTGGCAAT 

CTGGCGGCCGCGATGCAGGGTCTGCTGGCGTTGCCGTTGCTGGCGGGCTACGACATTCATGCG 

TCTGACCCGCAGAGCGCGGGTCGTATCGTTTCGTTCGACGCCGCCGGCGGTTGGAACATCGAG 

GAAGAGGGCTATCAGGCGGTGGGCTCGGGTTCGCTGTTCGCGAAGTCGTCGATGAAGAAGTTG 

TATTCGCAGGTTACCGACGGTGATTCGGGGCTGCGGGTGGCGGTCGAGGCGCTCTACGACGCC 

GCCGACGACGACTCCGCCACCGGCGGTCCGGACCTGGTGCGGGGCATCTTTCCGACGGCGGT 

GATCATCGACGCCGACGGGGCGGTTGACGTGCCGGAGAGCCGGATTGCCGAATTGGCCCGCG 

CGATCATCGAAAGCCGTTCGGGTGCGGATACTTTCGGCTCCGATGGCGGTGAGAAGTGA 

>Rv21 1 8c - = B21 26_G1_1 65 (83.6%) TB.seq 2377471 :237831 0 MW:30091 
>emb|AL1 23456|MTBH37RV:c237831 0-2377468. Rv21 1 8c SEQ I D NO:70 

GTGTCAGCAACCGGCCCATTCAGCATCGGCGAACGTGTTCAGCTCACCGACGCTAAGGGGCGC 

CGCTACACCATGTCGCTGACTCCCGGTGCCGAATTCCACACTCATCGTGGCTCGATCGCCCACG 

ACGCGGTGATCGGGTTGGAGCAAGGCAGCGTGGTCAAATCCAGCAACGGCGCCCTGTTCCTGG 

TGCTGCGCCCGCTGCTGGTCGACTACGTCATGTCGATGCCGCGCGGCCCGCAGGTGATCTATC 

CCAAAGATGCGGCCXAGATCGTGCATGAGGGCGACATATTTCCCGGCGCGCGGGTGCTGGAG 

GCAGGAGCCGGATCCGGTGCTCTGACCTTGTCTTTGCTGCGGGCGGTTGGGCCGGCCGGACA 

GGTGATCTCCTACGAACAGCGCGCCGATCATGCCGAACACGCCCGGCGCAATGTGAGCGGCTG 

CTACGGCCAGCCGCCGGACAACTGGCGACTGGTCGTCAGCGACCTCGCCGACTCCGAACTGC 

CCGAGGGATCCGTTGATCGGGCCGTGCTCGACATGCTGGCGCCGTGGGAGGTGCTCGACGCG 

GTATCGCGGCTGCTGGTCGCCGGCGGAGTGCTGATGGTCTACGTGGCCACCGTCACTCAGCTG 

TCGAGGATCGTGGAGGCACTGCGGGCCAAGCAGTGCTGGACCGAACCGAGAGCCTGGGAGAC 
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GCTGCAGCGGGGCTGGAACGTCGTAGGGTTGGCGGTTCGGCCGCAGCATTCGATGCGCGGGC 
ATACCGCGTTCCTGGTAGCAACGCGCCGGTTGGCGCCGGGGGCTGTGGCTCCGGCGCCGCTA 
GGTCGTAAGCGCGAGGGACGCGACGGGTAG 

5 >Rv2144c - TB.seq 2404166:2404519 MW:12028 

>emb|AL123456|MTBH37RV:c240451 9-2404163. Rv2144c SEQ ID NO:71 

ATGCTGATCATTGCGCTGGTCTTGGCCXJTGATTGGGCTCCTGGCCTTGGTGTTCGCGGTGGTCA 
CCAGCAACCAGCTAGTGGCCTGGGTATGCATCGGGGCCAGCGTGCTGGGTGTGGCGTTGCTGA 
TCGTCGATGCGTTGCGAGAACGCCAGCAAGGTGGCGCGGACGAAGCTGATGGGGCTGGGGAA 
10 ACGGGTGTCGCGGAGGAAGCCGACGTCGACTACCCGGAGGAAGCCCCCGAGGAGAGCCAAGC 
CGTCGACGCCGGTGTCATCGGCAGTGAGGAGCCATCGGAGGAGGCCAGCGAAGCGACCGAGG 
AGTCGGCGGTATCGGCGGACCGAAGCGACGACAGCGCCAAGTAG 

>Rv2146c - TB.seq 2405667:2405954 MW:10805 
15 >emb|AL123456|MTBH37RV:c2405954-2405664. Rv2146c SEQ ID NO:72 

TTGGTGGTGTTTTTTCAGATCCTTGGGTTCGCGCTGTTCATCTTCTGGCTGCTGCTGATCGCTCG 
GGTCGTCGTTGAGTTCATCCGCTCGTTCAGCCGTGACTGGCGTCCCACCGGTGTCACCGTGGT 
GATCTTGGAGATCATCATGTCGATCACTGATCCGCCGGTGAAGGTGCTGCGCCGGCTGATCCC 
GCAACTCACGATCGGCGCGGTCCGGTTCGACCTGTCGATCATGGTGCTGCTGCTGGTTGCGTT 
20 CATCGGTATGCAACTGGCGTTTGGTGCTGCGGCCTGA 

>Rv2147c - TB.seq 24061 19:2406841 MW:27630 

>emb|AL123456|MTBH37RV:c2406841 -24061 16. Rv2147c SEQ ID NO:73 

GTGAATAGTCACTGTAGTCACACCTTCATCACAGACAACAGATeTCCCAGGGCTAGAAGGGGTG 
25 ACGCAATGAGCACACTGCACAAGGTCAAGGCCTACTTCGGTATGGCTCCCATGGAGGATTACGA 
CGACi3AGTACTACGACGACCGCGCTCX:CTCGCGCGGGTATGCGCGGCCCCGATTCGACGACG 
ACTACGGCCGCTACGATGGGCGCGACTACGACGACGCGCGCAGCGATTCACGCGGTGACCTG 
CGCGGTGAGCCGGCCGACTATCGACCACCGGGATATCGCGGCGGGTACGCGGACGAACCACG 
TTTCCGGCCCCGGGAGTTCGACCGCGCGGAGATGACACGGCCGCGCTTCGGATCGTGGCTGC 
30 GCAACTCCACCCGCGGCGCGCTAGCGATGGACCCCCGCCGGATGGCGATGATGTTCGAGGAT 
GGCCATCCGCTCTCGAAGATCACCACGGTGCGGCGCAAGGACTACAGCGAGGCTCGCACCATC 
GGTGAGCGGTTCCGCGACGGCAGCCCGGTCATCATGGATCTGGTGTCGATGGACAACGCCGAT 
GCCAAGCGGCTGGTCGATTTCGCGGCCGGCCTGGCCTTCGCGCTGCGCGGCTCGTTCGACAA 
GGTCGCGACCAAGGTGTTCCTGCTCTCGCCTGCAGACGTCGATGTGTCCCCCGAGGAGCGCCG 
35 GAGGATCGCCGAAACCGGGTTCTACGCCTACCAATAG 

>Rv2148c - TB.seq 2406841:2407614 MW:27694 
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> mb|AL1 23456|MTBH37RV:c240761 4-2406838. Rv21 48c SEQ ID NO:74 

ATGGCGGCGGATCTTTCGGCGTATCCAGACCGCGAATCGGAATTGACGCATGCGTTGGCGGCA 

ATGCGATCGCGACTTGCGGCGGCCGCGGAGGCGGCGGGTCGCAATGTCGGCGAAATTGAACT 

TCTACCGATTACCAAATTCTTTCCAGCAACCGATGTTGCGATTTTGTTTCGATTGGGTTGTCGGTC 

CGTTGGCGAATCGCGCGAACAGGAAGCTTCAGCCAAGATGGCCGAACTTAATCGGTTGTTGGC 

GGCTGCCGAGTTGGGTCACTCGGGGGGTGTGCACTGGCACATGGTGGGCCGGATTCAACGCA 

ACAAAGCCGGGTCGCTGGCTCGCTGGGCGCACACCGCTCACTCGGTGGACAGCTCGCGGTTG 

GTGACCGCGCTGGATCGGGCGGTTGTTGCGGCGCTGGCCGAACACCGTCGTGGCGAGCGGCT 

GCGGGTTTACGTCCAGGTCAGCCTCGACGGTGACGGATCCCGGGGCGGCGTCGACAGCACGA 

CGCCCGGCGCCGTAGACCGGATTTGCGCGCAGGTGCAGGAGTCAGAGGGCCTCGAACTGGTC 

GGGTTGATGGGCATTCCGCCGCTGGATTGGGACCCGGACGAGGCCTTTGACCGGCTGCAATCG 

GAGCACAACCGGGTGCGTGCGATGTTCCCGCACGCGATCGGTCTGTCGGCGGGCATGTCCAAC 

GACCTTGAAGTCGCCGTCAAACATGGTTCGACCTGTGTGCGTGTCGGTACCGCGCTATTGGGTC 

CGCGGCGGTTACGGTCACCGTGA 

>Rv2150c ftsZ TB.seq 2408386:2409522 MW:38757 
>emb|AL123456|MTBH37RV:c2409522-2408383. ftsZ SEQ ID NO:75 

ATGACCCCCCCGCACAACTACGTGGCCGTCATCAAGGTCGTGGGTATCGGTGGTGGCGGTGTC 

AACGCCGTCAACCGAATGATCGAGCAGGGCCTCAAAGGCGTGGAATTCATCGCGATCAACACC 

GACGCCCAGGCGTTGTTGATGAGCGATGCCGACGTCAAACTCGACGTCGGCCGCGACTCCACC 

CGCGGGCTGGGCGCCGGCGCCGATCCGGAGGTCGGCCGTAAGGCCGCCGAGGACGCCAAGG 

ACGAGATCGAAGAGCTGCTGCGCGGTGCCGACATGGTGTTTGTCACCGCCGGCGAGGGGGGC 

GGAACCGGCACCGGGGGGGCACCCGTCGTCGCCAGCATCGCCCGCAAGCTGGGCGCGTTGAC 

CGTCGGTGTGGTCACCCGGCCGTTCTCGTTCGAGGGCAAGCGACGCAGCAATCAGGCCGAAAA 

TGGCATCGCGGCGCTGCGGGAGAGTTGCGACACCCTCATCGTGATTCCCAACGACCGGTTGCT 

GCAGATGGGAGATGCCGCGGTATCGCTGATGGATGCT7TCCGTAGCGCCGACGAGGTGCTGCT 

CAACGGCGTGCAGGGCATCACCGACCTGATTACCACCCCGGGTCTAATCAACGTCGACTTCGC 

CGACGTCAAGGGCATCATGTCCGGTGCCGGCACCGCACTGATGGGCATCGGCTCGGCCCGGG 

GCGAAGGCCGGTCGCTCAAAGCGGCCGAGATCGCCATCAACTCGCCGTTGCTGGAAGCCTCGA 

TGGAGGGCGCGCAAGGCGTGCTGATGTCGATCGCCGGCGGCAGCGACTTGGGCtTGTTCGAG 

ATCAACGAGGCGGCCTCGTTGGTACAAGACGCCGCTCACCCCGATGCCAACATCATCTTCGGC 

ACCGTCATCGACGATTCGCTCGGTGACGAGGTGCGGGTGACCGTGATCGCGGCCGGCTTCGAC 

GTCAGCGGTCCCGGCCGCAAGCCGGTGATGGGCGAGACCGGCGGCGCCCACCGGATCGAGT 

CAGCCAAGGCAGGCAAGCTCACCTCGACCTTGTTCGAGCCGGTCGACGCCGTCAGCGTGCCGT 

TGCACACCAACGGCGCAACCCTGAGCATCGGCGGTGATGACGACGATGTCGACGTGCCGCCCT 

TCATGCGCCGCTGA 
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>Rv2152c murC TB.seq 2410639:2412120 MW:51146 
>emb|AL123456lMTBH37RV:c241 21 20-241 0636. murC SEQ ID NO:76 

GTGAGCACCGAGCAGTTGCCGCCCGATCTGCGGCGGGTGCACATGGTCGGCATCGGCGGAGC 

TGGCATGTCGGGCATCGCCCGAATCCTGCTGGACCGCGGCGGGCTGGTCTCCGGGTCAGACG 

CCAAGGAGTCGCGCGGTGTGCATGCGCTGCGGGCGCGGGGCGCGTTGATCCGGATCGGACAC 

GACGCGTCGTCGCTGGACCTGTTGCCCGGTGGCGCCACGGCGGTCGTCACTACCCATGCCGC 

CATCCCCAAAACCAACCCCGAGCTCGTCGAAGCGAGGCGCCGCGGCATTCCCGTGGTGCTGCG 

GCCGGCCGTGCTGGCCAAGTTGATGGCCGGGCGCACCACATTGATGGTCACCGGCACGCACG 

GCAAGACAACGACGACGTCCATGCTGATCGTCGCCCTGCAGCACTGCGGGqTTGACCCGTCCT 

TTGCGGTCGGCGGTGAGCTGGGGGAGGCCGGTACCAACGCCCATCACGGCAGTGGCGACTGT 

TTCGTCGCCGAAGCCGACGAAAGCGATGGCTCGCTGTTGCAGTACACACCCCACGTCGCGGTG 

ATCACCAACATCGAGTCCGATCACCTGGACTTCTACGGCAGCGTCGAGGCGTATGTTGCGGTGT 

TCGACTCXJTTCGTGGAGCGCATTGTCCCCGGGGGTGCGCTGGTGGTGTGCACTGACGACCCCG 

GAGGGGCCGCGCTGGCTCAGCGCGCGACTGAGCTGGGAATTCGAGTGCTGCGATACGGGTCG 

GTGCCGGGTGAGACCATGGCAGCCACGTTGGTCTCGTGGCAGCAACAGGGGGTCGGGGCGGT 

CGCACATATCCGGTTGGCCTCAGAACTAGCCACAGCACAGGGTCCCCGCGTGATGCGGGTGTC 

GGTGCCCGGGCGACACATGGCGCTCAACGCGCTGGGAGCGCTGCTGGCCGCGGTGCAGATCG 

GCGCCCCGGCCGACGAGGTGCTCGACGGGCTGGCCGGCTTCGAAGGAGTGCGGCGACGATTC 

GAACTGGTTGGGACCTGCGGCGTCGGAAAGGCGTCGGTGCGCGTGTTCGATGACTACGCCCAC 

CACCCGACGGAGATCAGCGCGACACTGGCGGCGGCGCGCATGGTGCTCGAACAGGGCGACGG 

TGGCCGCTGCATGGTTGTGTTTCAACCCCATTTGTATTCGCGGACAAAGGCATTCGCTGCTGAG 

TTTGGGCGTGCGCTGAATGCCGCTGACGAGGTGTTCGTACTCGACGTCTACGGAGCTCGTGAA 

CAACCGCTGGCCGGTGTCAGCGGAGCCAGCGTCGCTGAGCACGTCACTGTGCCGATGCGCTA 

CGTCCCGGATTTTTCGGCGGTCGCACAGCAAGTGGCCGCCGCCGCTAGTCCGGGCGACGTCAT 

CGTCACGATGGGTGCGGGAGACGTGACCTTGCTGGGCCCGGAAATCCTGACCGCCCTTCGGGT 

CCGGGCCAACCGAAGCGCCCGCGGCCGTCCGGGGGTGCTGGGATGA 

♦ 

>Rv2153c murG TB.seq 2412120:2413349 MW:41829 
>emb|AL123456|MTBH37RV:c2413349-2412117. murG SEQ ID NO:77 

GTGAAGGACACGGTCAGCCAGCCGGCCGGCGGGCGCGGGGCAACGGCGCCCCGGCCCGCCG 

ATGCCGCCTCGCCGTCTTGTGGTTCCTCGCCGTCTGCTGATTCCGTGTCGGTCGTTCTCGCCGG 

CGGCGGGACCGCCGGGCACGTCGAGCCCGCCATGGCCGTCGCCGACGCCTTGGTCGCGTTGG 

ATCCGCGCGTCCGGATTACCX3CGTTGGGCACCCTCCGTGGACTAGAGACCAGGCTGGTGCCCC 

AGCGCGGCTACCACCTGGAGCTGATCACGGCGGTGCCGATGCCGCGCAAGCCCGGCGGCGAC 

CTGGCCCGGCTGCCGTCGCGGGTGTGGCGCGCCGTCCGGGAGGCCCGGGACGTGCTCGACX3 

ATGTGGACGCCGACGTCGTCGTCGGTTTCGGTGGGTACGTCGCGCTACCGGCTTACCTAGCCG 

CTCGCGGCCTGCCTTTGCCGCCCCGGCGCCGGCGCCGGATCCCGGTGGTGATCCACGAAGCC 
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AACGCCAGGGCGGGACTGGCCAACCGGGTCGGCGCCCATACCGCGGACCGGGTGCTCTCCGC 

GGTGCCGGATTCCGGGCTGCGGCGCGCCGAGGTGGTTGGGGTCCCGGTCCGTGCGTCGATCG 

CCGCGCTGGACCGCGCGGTGCTGCGAGCCGAGGCGCGGGCACACTTCGGCTTCCCCGACGAC 

GCGCGGGTGCTGCTGGTGTTCGGGGGTTCGCAGGGCGCGGTCTCGCTCAACCGGGCGGTGTC 

CGGCGCCGCCGCCGACCTGGCCGCCGCCGGTGTTTGCGTGCTGCATGCCCATGGACCCCAGA 

ACGTGGTGGAGTTGCGCCGTCGGGCTCAAGGTGACCCACCGTACGTGGCGGTGCCCTATTTGG 

ACCGGATGGAGCTGGCCTACGCCGCCGCCGATCTGGTGATCTGCCGGGCCGGGGCGATGACG 

GTCGCCGAAGTATCCGCCGTCGGTCTGCCGGCCATCTACGTGCCGCTGCCGATCGGCAACGGT 

GAACAGCGGCTGAATGCGTTGCCGGTAGTCAATGCCGGCGGCGGCATGGTGGTCGCCGACGC 

CGCCCTGACCCCCGAGTTGGTGGCCCGCCAGGTTGCCGGGCTGCTCACCGACCCCGCGCGGC 

TGGCCGCGATGACCGCGGCCGCAGCCAGGGTGGGACATCGCGATGCCGCGGGCCAGGTGGC 

CCGGGCCGCGCTGGCCGTCGCCACGGGGGCCGGTGCCAGGACAACGACGTGA 



>Rv2154c flsW TB.seq 2413349:2414920 MW:56306 

IS >emb|AL123456|MTBH37RV:c2414920-2413346. ftsW SEQ ID NO:78 

GTGCTAACCCGGTTGCTGCGTCGGGGCACCAGCGACACCGACGGCTCCCAGACTCGAGGGGC 
CGAGCCGGTCGAGGGGCAGCGGACGGGCCCGGAAGAAGCCTCTAACCCGGGTTCGGCGAGG 
CCCCGCACCCGTTTCGGTGCCTGGCTGGGCCGTCCGATGACCTCGTTTCACCTCATCATCGCC 
GTTGCCGCATTGCTGACCACCCTTGGACTGATCATGGTGCTGTCGGCATCGGCGGTGCGGTCC 

20 TACGACGACGACGGATCGGCTTGGGTGATCTTCGGCAAGCAGGTCTTGTGGACGCTTGTGGGT 
CTTATCGGCGGCTATGTCTGTCTGCGGATGTCGGTGCGGTTCATGCGGCGCATCGCCTTCTCCG 
GTTTCGCGATCACCATCGTGATGCTGGTGCTGGTGCTGGTGCCGGGGATCGGCAAGGAGGCCA 
ACGGCTCGCGCGGCTGGTTCGTGGTCGCGGGCTTCTCGATGCAGCCCTCTGAGCTGGCTAAGA 
TGGGGTTCGCCATCTGGGGAGCGCATCTGCTGGCCGCCCGGCGCATGGAACGGGCTTCACTG 

25 CGCGAGATGCTGATTCCACTGGTGCCGGCCGCCGTCGTTGCGCTGGCGCTGATCGTGGCCCAG 
CCCGACCTCGGACAGACCGTGTCGATGGGCATCATCTTGTTGGGCCTGCTGTGGTATGCGGGG 
CTGCCGCTGCGCGTCTTCCTCAGCTCACTGGCGGCGGTCGTCGTCTCGGCCGCCATCCTGGCG 
GTGTCCGCGGGCTACCGATCCGACCGGGTGCGGTCGTGGCTCAACCCCGAAAACGATCCGCAA 
GACTCCGGCTACCAGGCCCGACAGGCAAAGTTCGCGCTGGCTCAAGGTGGCATTTTCGGCGAC 

30 GGTCTGGGCCAAGGCGTGGCCAAGTGGAACTACTTGCCCAACGCCCACAACGACTTCATTTTCG 

ccatcatcggcgaagagctgggtctcgtcggcgcgctcggactgctggggctattcggattgt 
tcgcctacaccggcatgcgcatcgctagccggtccgccgacccgttcctgcggctgctgaccg 
ccacx:acgacactgtgggtgctgggacaggcgttcatcaacatcggctatgtgatcgggctgc 
tgcccgtcaccggcctgcagctgccgctcatctccgccggtggaacctccacggccgcaacac 
35 tttcgctgataggcatcatcgccaacgcggctcgccacgaaccggaggcggtggccgcgctg 
cgggctgggcgcgacgacaaggtgaaccggttgctgcggctgccgctgcccgagccgtatct 
gccccctcgtctcgaggcgtttcgtgaccgcaagcgcgccaacccgcaaccggcccaaacgca 
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GCCCGCGCGGAAGACCCCCCGCACGGCGCCCGGACAGCCTGCCCGGCAGATGGGCCTGCCC 

CCGCGACCCGGCTCGCCCCGCACGGCCGATCCGCCGGTTCGTCGATCAGTGCATCATGGAGCT 

GGCCAGCGGTACGCGGGCCAGCGTCGCACACGGCGCGTTCGGGCATTGGAAGGTCAGCGTTA 

CGGGTGA 

>Rv2155c murD TB.seq 2414935:2416392 MW:49314 
>emb|AL123456|MTBH37RV:c2416392-2414932. murO SEQ ID NO:79 

GTGCTTGACCCTCTGGGGCCGGGTGCGCCCGTGTTGGTAGCCGGTGGCCGGGTGACCGGTCA 

GGCGGTGGCCGCGGTGCTGACTCGGTTTGGTGCGACGCCGACGGTGTGCGACGACGATCCGG 

TCATGCTGCGACCGCACGCCGAACGTGGGCTGCCGACCGTTAGTTCCTCGGACGCGGTGCAGC 

AGATAACCGGGTATGCGCTGGTGGTCGCCAGTCCCGGCTTCTCGCCCGCAACCCCGCTACTGG 

CCGCGGCCGCGGCGGCGGGGGTGCCGATCTGGGGTGACGTGGAGTTAGCCTGGCGGCTAGA 

CGCAGCGGGCTGCTACGGACCGCCGCGCAGCTGGCTGGTGGTGACCGGCACCAACGGCAAGA 

CCACCACGACGTCGATGCTGCACGCCATGCTGATCGCCGGTGGCCGCCGCGCCGTGCTGTGC 

GGCAATATCGGCAGTGCGGTGCTGGATGTGCTGGACGAGCCGGCCGAGCTGCTGGCCGTGGA 

GTTGTCCAGTTTCCAGCTGCACTGGGCGCCGTCGCTGCGGCCCGAGGCCGGCGCGGTGCTCA 

ACATTGCCGAAGACCACCTGGACTGGCATGCCACGATGGCCGAATACACCGCGGCCAAGGCCC 

GGGTGCTGACCGGCGGGGTAGCGGTGGCCGGGCTGGATGACAGCCGAGCGGCCGCACTGCT 

GGACGGCTCACCGGCGCAGGTGCGGGTCGGCTTCCGGCTCGGCGAGCCGGCCGCGCGGGAA 

CTGGGCGTGCGCGACGCCCACCTGGTCGATCGCGCCTTCTCCGACGACTTGACGCTGCTGCCG 

GTCGCGTCGATACCGGTGCCAGGTCCGGTCGGCGTGCTTGACGCCCTGGCCGCGGCGGCGCT 

GGCCCGGTCGGTCGGGGTGCCCGCCGGTGCGATCGCCGACGCGGTCACGTCGTTTCGAGTGG 

GCCGACACCGCGCCGAGGTGGTGGCCGTTGCCGACGGCATCACCTACGTGGACGACTCCAAG 

GCCACCAACCCGCACGCCGCGCGGGCTTCGGTGCTTGCATACCCGAGGGTGGTATGGATCGC 

CGGTGGCCTGCTCAAGGGCGGGTCGCTTCACGCCGAGGTTGCGGCGATGGCGTCGCGGCTGG 

TCGGTGCGGTGCTGATCGGCCGGGATCGCGCAGCGGTTGCCGAGGCGTTATCACGACACGCG 

CCCGATGTCCCAGTCGTTCAGGTTGTGGCAGGCGAGGATACTGGTATGCCTGCGACTGTTGAG 

GTTCCTGTTGCTTGTGTTCTAGATGTGGCAAAAGATGACAAAGCCGGTGAGACCGTTGGCGCTG 

CCGTGATGACCGCTGCGGTGGCCGCGGCCCGGCGGATGGCCCAACCCGGTGACACCGTGCTG 

CTGGCACCGGCCGGCGCCTCATTCGACCAGTTCACCGGTTATGCCGACCGGGGCGAGGCATTC 

GCGACCGCGGTCCGCGCGGTGATCCGGTAG 

>Rv2156c murX TB.seq 2416397:2417473 MW:37714 

>emb|AL123456|MTBH37RV:c2417473-2416394. murX SEQ ID NO:80 
ATGAGGCAGATCCTTATCGCCGTTGCCGTAGCGGTGACGGTGTCGATCTTGCTGACCCCGGTG 
CTGATCCGGTTGTTCACTAAGCAGGGCTTCGGCCACCAGATCCGTGAGGATGGCCCGCCCAGC 
CACCACACCAAGCGCGGTACGCCGTCGATGGGCGGGGTGGCGATTCTGGCCGGCATCTGGGC 
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GGGCTACCTGGGCGCCCACCTAGCGGGCCTGGCGTTTGACGGTGAAGGCATCGGCGCATCGG 

GTCTGTTGGTGCTGGGCCTAGCCACCGCTTTGGGCGGCGTCGGGTTCATCGACGATCTGATCA 

AGATCCGCAGGTCGCGCAATCTCGGGTTGAACAAGACGGCCAAGACCGTCGGGCAGATCACCT 

CCGCCGTGCTGTTTGGCGTGCTGGTGCTGCAGTTCCGGAATGCTGCCGGCCTGACACCGGGCA 

GCGCGGATCTGTCCTACGTGCGTGAGATCGCCACCGTCACATTGGCGCCGGTGCTGTTCGTGT 

TGTTCTGCGTGGTCATCGTCAGCGCGTGGTCGAACGCGGTCAACTTCACCGATGGCCTGGACG 

GGCTGGCCGCCGGCACCATGGCGATGGTCACCGCCGCCTACGTGCTGATCACCTTCTGGCAGT 

ACCGCAACGCGTGCGTGACGGCGCCGGGCCTGGGCTGCTACAACGTGCGCGACCCGCTGGAC 

CTGGCGCTCATCGCGGCCGCAACCGGTGGCGCCTGCATCGGTTTTTTGTGGTGGAACGCCGCG 

CCCGCCAAGATCTTCATGGGTGACACTGGGTCGCTGGCGTTGGGCGGCGTCATCGCGGGGTTG 

TCGGTGACCAGCCGCACCGAGATCCTTGCGGTGGTGCTGGGTGCGCTGTTCGTCGCCGAGATC 

ACCTCGGTGGTGTTGCAAATCCTGACCTTCCGGACCACCGGGCGCCGGATGTTTCGGATGGCG 

CCCTTCCACCACCATTTCGAGTTGGTCGGTTGGGCTGAAACCACGGTCATCATCCGG1TCTGGC 

TGCTCACCGCGATCACCTGCGGTCtGGGCGTGGCCTTGTTCTACGGTGAGTGGCTTGCCGCGG 

TCGGTGCCTGA 

>Rv2157c murF TB.seq 2417473:2419002 MW:51634 
>emb|AL123456|MTBH37RV:c241 9002-2417470. murF SEQ ID NO:81 

ATGATCGAGCTGACCGTCGCGCAGATCGCCGAGATCGTCGGGGGCGCAGTGGCCGATATCTCC 

CCGCAAGACGCCGCGCACCGCCGCGTCACCGGGACCGTCGAGTTCGACTCGCGCGCCATCGG 

CCCGGGCGGGCTGTTCCTCGCCCTGCCGGGGGCGCGCGCCGACGGCCACGACCATGCCGCG 

TCGGCGGTAGCCGCGGGCGCCGCCGTCGTGCTGGCCGCCCGCCCGGTGGGGGTGCCGGCCA 

TCGTGGTTCCGCCAGTGGCCGCGCCGAACGTATTGGCCGGCGTCCTCGAGCACGACAACGAC 

GGGTCGGGGGCGGCGGTGCTGGCCGCGCTGGCCAAGCTGGCCACCGCGGTGGCCGCGCAGT 

TGGTGGCCGGCGGGCTCACCATCATCGGGATCACCGGCTCGTCGGGCAAGACGTCGACCAAG 

GACCTGATGGCCGCCGTGCTGGCCCCGCTGGGGGAGGTGGTGGCCCCGCCCGGATCGTTCAA 

CAACGAGCTGGGTCACCCGTGGACGGTGCTGCGCGCGACGCGGCGCACCGACTACCTGATTTT 

GGAGATGGCGGCACGCCATCACGGCAACATCGCCGCGCTCGCCGAGATCGCGCCCCCGTCGA 

TCGGAGTCGTGCTCAACGTCGGCACCGCACATTTGGGTGAGTTCGGCTCCCGCGAGGTCATCG 

CACAGACCAAAGCCGAACTGCCGCAGGCTGTTCCGCATTCCGGAGCGGTCGTCCTCAACGCTG 

ATGACCCCGCGGTGGCGGCGATGGCCAAGCTGACCGCGGCCCGGGTGGTGCGGGTCAGCCG 

GGACAACACCGGTGACGTTTGGGCGGGGCCGGTGTCGCTGGACGAATTGGCCAGGCCGCGCT 

TTACGCTGCATGCCCACGATGCCCAAGCCGAGGTCCGACTCGGGGTCTGCGGCGACCACCAG 

GTCACTAACGCGCTGTGCGCCGCGGCGGTCGCGCTGGAGTGTGGGGCCAGCGTTGAACAGGT 

CGCGGCCGCGCTGACCGCGGCGCCGCCGGTGTCGCGGCATCGGATGCAGGTGACCACCCGC 

GGCGACGGGGTGACGGTGATCGACGACGCCTACAACGCCAACCCCGACTCCATGCGGGCCGG 

GCTGCAGGCGCTGGCCTGGATCGCGCACCAACCCGAGGCCACCCGCCGCAGCTGGGCGGTGC 
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TGGGTGAGATGGCCGAGCTGGGTGAGGACGCGATAGCCGAGCACGATCGCATCGGCCGGCTC 

GCGGTGCGCTTAGATGTGTCTCGACTCGTTGTCGTGGGAACCGGGAGGTCGATCAGCGCCATG 

CACCACGGAGCGGTCCTGGAGGGGGCGTGGGGCTCGGGGGAAGCCACTGCTGATCACGGTGC 

GGATCGCACGGCCGTCAATGTGGCCGACGGTGACGCCGCCCTGGCACTACTGCGCGCCGAGC 

TGCGACCCGGGGATGTGGTCTTGGTCAAGGCCTCGAACGCGGCCGGGCTGGGTGCGGTGGCC 

GATGCATTGGTCGCAGACGACACATGCGGGAGTGTGCGCCCATGA 

>Rv2158c murE TB.seq 2419002:2420606 MW:55310 
>emb|AL123456|MTBH37RV:c2420606-2418999. murE SEQ ID NO:82 

gtgtcatcgctggcccgagggatctcgcggcggcgaacggaggtggcgacacaggtggaggc 

tgcgcccactggcttgcgccccaacgccgtcgtgggcgttcggttggccgcactggccgatca 

ggtcggcgcggccctggccgagggtccagctcagcgtgccgtcaccgaggaccggacggtca 

ccggggtcacgctgcgcgccx^aggacgtgtcacccggtgacctgttcgccgccctgaccggc 

tcgaccacccacggggcccgccacgtcggcgacgcgatcgcacgcggcgcx:gtcgcggtgct 

caccgaccccgccggggtcgccgagatcgccggacgagcggccgtgcccgtgttggtgcacc 

ccgcaccccgcggcgtgctcggcggcttggccgccaccgtgtacgggcatccgtccgagcgg 

ttgacggttatcgggatcaccggaacgtccggcaagaccaccaccacctatctggtcgaggcc 

gggttacgggctgccggacgcgtcgccgggctgatcggcaccatcggcatccgcgtcggcgg 

cgccgaccttcccagcgcgctgaccaccccggaggcxdcccacgctgcaggcgatgctggcgg 

cgatggtcgaacgcggggtggacaccgtggtcatggaggtgtccagccacgcgctggcgctg 

ggccgggtggacggcacccggttcgccgtcggcgccttcaccaatctctcccgtgaccacctg 

gatttccaccccagcatggccgactacttcgaggccaaggcgtcattgttcgatccggactggg 

cactgcgcgcgcgcaccgccgtggtgtgcatcgacgacgacgccgggcgcgcgatggcggc 

gcgggccgccgacgcgatcaccgtcagcgccgccgaccggcccgcacactggcgcgccacg 

gatgtggcgcccacggacgcgggcgggcaacaattcaccgccatcgaccccgccggcgtagg 

gcatcacatcggaatccggctaccgggccgctacaacgtcgccaattgcctggtcgccctggc 

gattctggacaccgtcggggtctccccggaacaggcggtgccgggcctgcgtgagatccggg 

tcccggggcggctcgagcagatcgaccgcggccaggggtttctcgcgctggtcgactacgcg 

cacaaaccggaagcgctgcggtcggtgctgaccaccttggcgcacccggaccgccggctggc 

ggtggtgttcggcgccggcggcgatcgtgacccgggcaagcgggccccgatgggccggata 

gccgcgcagctggccgacttggtggtcgtcaccgacgacaacccgcgtgacgaagatcccac 

ggcgatccgccgcgaaatcgtggctggggcggccgaagtcggcggtgatgcccaggtcgtcg 

agatcgcagaccggcgggacgcgatccggcacgcggttgcctgggcgcgccccggcgacgt 

ggtgctcatcgccggcaaaggccacgagaccgggcaacgcggcggcgggcgggtccgcccg 

ttcgacgaccgggtggagctggctgccgcgctagaggccctcgagcggcgcgcatga 

>Rv2159c - TB.seq 2420632:2421663 MW:36377 

102 



BNSOOCIO: <WO 013S317A1 I > 



wo 01/35317 



PCT/USOO/31152 



>emb|AL123456|MTBH37RV:c2421 653-2420629. Rv2159c SEQ ID NO:83 

ATGAAATTTGTCAACCATATTGAGCCCGTCGCGCCCCGCCGAGCCGGCGGCGCGGTCGCCGAG 

GTCTATGCCGAGGCCCGCCGCGAGTTCGGCCGGCTGCCCGAGCCGCTCGCCATGCTGTCCCC 

GGACGAGGGACTGCTCACCGCCGGCTGGGCGACGTTGCGCGAGACACTGCTGGTGGGCCAGG 

TGCCGCGTGGCCGCAAGGAAGCCGTCGCCGCCGCCGTCGCGGCCAGCCTGCGCTGCCCCTGG 

TGCGTCGACGCACACACCACCATGCTGTACGCGGCAGGCCAAACCGACACCGCCGCGGCGAT 

CTTGGCCGGCACAGCACCTGCCGCCGGTGACCCGAACGCGCCGTATGTGGCGTGGGCGGCAG 

GAACCGGGACACCGGCGGGACCGCCGGCACCGTTCGGCCCGGATGTCGCCGCCGAATACCTG 

GGCACCGCGGTGCAATTCCACTTCATCGCACGCCTGGTCCTGGTGCTGCTGGACGAAACCTTC 

CTGCCGGGGGGCCCGCGCGCCCAACAGCTCATGCGCCGCGCCGGTGGACTGGTGTTCGCCCG 

CAAGGTGCGCGCGGAGCATCGGCCGGGCCGCTCCACCCGCCGGCTCGAGCCGCGAACGCTG 

CCCGACGATCTGGCATGGGCAACACCGTCCGAGCCCATAGCAACCGCGTTCGCCGCGCTCAGC 

CACCACCTGGACACCGCGCCGCACCTGCCGCCACCGACTCGTCAGGTGGTCAGGCGGGTCGT 

GGGGTCGTGGCACGGCGAGCCAATGCCGATGAGCAGTCGCTGGACGAACGAGCACACCGCCG 

AGCTGCCCGCCGACCTGCACGCGCCCACCCGTCTTGCCCTGCTGACCGGCCTGGCCCCGCAT 

CAGGTGACCGACGACGACGTCGCCGCGGCCCGATCCCTGCTCGACACCGATGCGGCGCTGGT 

TGGCGCCCTGGCCTGGGCCGCCTTCACCGCCGCGCGGCGCATCGGCACCTGGATCGGCGCCG 

CCGCCGAGGGCCAGGTGTCGCGGCAAAACCCGACTGGGTGA 

>Rv2163c pbpB TB.seq 2425049:2427085 MW:72506 
>emb|AL123456|MTBH37RV:c2427085-2425046. pbpB SEQ ID NO:84 

GTGAGCCGCGCCGCCCCCAGGCGGGCCAGTCAGTCGCAGTCGACGCGACCGGCGCGCGGTTT 

GCGCCGGCCACCGGGAGCCCAGGAGGTTGGGCAACGCAAACGGCCCGGCAAAACGCAGAAAG 

CCCGGCAAGCCCAGGAAGCCACGAAATCCCGCCCTGCGACACGGTCAGACGTCGCACCCGCG 

GGTCGCTCGACTCGTGCGAGGCGCACCCGGCAGGTGGTGGACGTCGGGACGCGCGGTGCGTC 

GTTCGTCTTTCGGCATCGGACCGGAAACGCGGTCATCTTGGTGTTGATGTTGGTCGCGGCAACA 

CAATTGTTCTTTCTGCAGGTATCACATGCCGCGGGCCTGCGTGCGCAGGCGGCCGGCCAACTC 

AAGGTCACCGACGTCCAGCCAGCGGCTCGCGGCAGCATCGTCGACCGCAACAATGAGCGGCTC 

GCGTTCACCATCGAGGCGCGTGCCCTGACGTTCCAGCCGAAGCGGATTCGGCGGCAATTGGAA 

GAGGCCAGGAAGAAGACGTCGGCTGCACCCGACCCGCAGCAGCGCCTGCGCGATATCGCCCA 

GGAGGTCGCCGGCAAGCTGAACAACAAGCCAGATGCCGCGGCCGTGCTGAAGAAGCTGCAAA 

GCGACGAGACCTTCGTCTACTTGGCGCGTGCGGTCGACCCGGCTGTCGCCAGCGCGATCTGCG 

CGAAGTATCCCGAGGTCGGTGCGGAAAGACAGGATCTGCGTCAGTACCCGGGTGGGTCGCTG 

GCGGCAAACGTCGTCGGTGGCATCGACTGGGATGGTCATGGGCTGCTGGGTCTGGAGGACTCC 

CTGGATGCGGTGCTGGCCGGAACCGACGGATCGGTCACCTACGACCGTGGGTCAGACGGCGT 

CGTCATCCCCGGCAGCTACCGGAATCGGCACAAGGCGGTCCACGGTTCCACCGTCGTGCTCAC 

CCTCGACAACGACATCCAGTTCTACGTGCAGCAGCAGGTGCAGCAGGCCAAGAACCTATCGGG 
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GGCTCACAACGTCTCGGCCGTCGTCCTGGACGCCAAGACCGGCGAGGTGCTCGCGATGGCCA 

ACGACAACACCTTCGACCCGTCGCAAGACATCGGGCGCCAGGGCGACAAGCAGTTGGGCAACC 

CGGCGGTGTCGTCGCCCTTCGAGCCGGGCTCGGTGAACAAGATCGTCGCCGCGTCCGCGGTC 

ATCGAGCACGGGTTGAGCAGCCCCGACGAGGTGCTACAGGTGCCTGGCTCGATCCAGATGGG 

CGGTGTTACCGTGCATGACGCTTGGGAGCACGGCGTGATGCCCTATACCACCACGGGGGTGTT 

CGGAAAGTCCTCCAACGTCGGCACGCTGATGCTTTCCCAACGTGTCGGACCGGAACGCTATTAC 

GATATGCTCCGCAAGTTCGGGTTGGGACAGCGCACCGGCGTGGGCCTGCCCGGTGAGAGCGC 

CGGACTGGTGCCGCCAATCGACCAGTGGTCGGGCAGTACGTTCGCTAATCTTCCTATTGGCCAA 

GGTCTTTCGATGACTTTGCTGCAGATGACCGGCATGTACCAGGCCATCGCCAACGATGGAGTGC 

GGGTACCCCCACGCATTATCAAGGCCACCGTCGCACCCGACGGCAGCCGAACCGAAGAACCGC 

GCCCCGACGACATTCGCGTGGT6TCGGCGCAGACCGCCCAGACCGTGCGCCAGATGCTGCGT 

GCCGTGGTGCAACGCGATCCGATGGGCTACCAGCAGGGTACCGGGCCGACGGCCGGGGTGCC 

CGGCTATCAGATGGCCGGCAAGACCGGTACCGCGCAGCAGATCAACCCTGGCTGCGGCTGCTA 

CTTCGACGACGTGTATTGGATCACCTTCGCCGGAATCGCCACTGCCGACAATCCCCGCTACGTG 

ATCGGCATCATGTTGGACAACCCGGCGCGCAACTCCGACGGCGCGCCTGGGCACTCGGCCGC 

CCCGCTGTTCCACAACATCGCGGGCTGGCTGATGCAGCGCGAAAACGTCCCGCTGTCACCCGA 

TCCCGGGCCTCCTTTGGTCTTGCAGGCCACCTAG 

>Rv2165c - TB.seq 2428236:2429423 MW:42498 

>emb|AL123456|MTBH37RV:c2429423-2428233, Rv2165c SEQ ID NO:85 

GTGCAAACCCGTGCACCGTGGTCTCTGCCCGAAGCGACCCTGGCGTAGTTCCCCAACGCCAGG 

TTCGTGTCTTCGGACAGGGACCTCGGTGCAGGGGCGGCGCCTGGAATAGCCGCGTCCCGAAGT 

ACGGCTTGCCAGACCTGGGGAGGTATCACGGTGGCTGATCCAGGTTCGGGGCCAACCGGTTTC 

GGTCATGTGCCGGTATTGGCGCAAGGTTGCTTCGAACTGCTTACCCCCGCACTAACCCGCTACT 

ATCCAGACGGCTCGCAGGCGGTCCTTCTCGACGCGACCATCGGCGCGGGCGGGCATGCGGAG 

CGGTTTTTGGAGGGATTGCCGGGTCTGCGCCTGATCGGGCTCGACCGTGACCCAACCGCTCTG 

GACGTCGCGCGGTCTCGGCTGGTGCGATTCGCTGACCGACTTACCCTGGTGCACACCCGCTAT 

GACTGTCTGGGCGCAGCGCTGGCTGAATCCGGTTATGCCGCAGTGGGATCAGTCGACGGAATC 

CTGTTCGATCTCGGCGTCTCATCCATGCAGCTCGACCGCGCCGAGCGGGGCTTCGCCTACGCC 

ACGGACGCGCCATTGGACATGCGGATGGACCCGACGACGCCGTTGACCGCAGCTGACATTGTC 

AACACTTACGACGAGGCGGCACTAGCCGACATCCTGCGTCGCTACGGAGAGGAGCGGTTTGCT 

CGGCGCATCGCTGCCGGTATCGTCCGCCGACGCGCAAAAACCCCGTTCACCTCGACCGCCGAA 

CTGGTTGCCCTGCTGTACCAGGCGATTCCAGCTCCGGCCCGGCGTGTCGGCGGGCATCCAGCC 

AAGCGAACATTCCAGGCGCTGCGCATCGCGGTCAACGATGAGCTGGAATCGCTGCGCACGGCC 

GTTCCTGCCGCGCTGGATGCCCTCGCTATCGGTGGGCGCATCGCX3GTGCTGGCCTACCAGTCG 

CTAGAGGACAGGATCGTCAAACGGGTGTTCGCCGAGGCAGTCGCGTCGGCCACCCCTGCGGG 

ACTTCCGGTCGAACTTCCCGGCCATGAGCCGCGATTCCGTTCGTTAACGCACGGCGCCGAACG 
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AGCGAGTGTGGCTGAGATCGAACGCAATCCCCGCAGTACTCCAGTGCGGTTGCGGGCCCTGCA 
ACGAGTCGAGCACCGGGCGCAATCGCAGCAATGGGCAACCGAGAAGGGTGATTCATGA 

>Rv2166c - TB.seq 2429428:2429856 MW:15912 

>emb|AL123456|MTBH37RV:c2429856.2429425. Rv2166c SEQ ID NO:86 

ATGTTTCTCGGCACCTACACGCCCAAACTCGACGACAAGGGGCGGCTGACGCTGCCGGCCAAG 

TTTCGCGACGCGTTGGCAGGGGGGTTGATGGTCACCAAGAGCCAAGATCACAGCCTGGCCGTT 

TACCCGCGGGCGGCGTTCGAGCAGCTGGCGCGCCGGGCCAGCAAGGCGCCACGAAGCAACC 

CCGAGGCGAGAGCGTTCCTACGTAATCTCGCCGCCGGTACCGACGAACAGCATCCCGACAGTC 

AAGGCCGGATCACCTTGTCGGCCGACCACCGCCGCTACGCAAGCCTTTCCAAGGACTGTGTGG 

TGATCGGCGCGGTCGACTATCTCGAGATCTGGGATGCGCAAGCCTGGCAGAACTACCAACAAAT 

CCATGAAGAGAACTTCTCCGCGGCCAGCGATGAAGCACTCGGTGACATCTTCTGA 

>Rv2197c - TB.seq 2461505:2462146 MW:22481 

>emb|AL123456|MTBH37RV:c24621 46-2461 502, Rv2197c SEQ ID NO:87 

ATGGTGAGCAGATATTCCGCATACCGGCGTGGGCCGGATGTAATCTCGCCGGACGTCATCGAT 

CGCATCCTGGTTGGGGCATGTGCCGCGGTGTGGCTGGTGTTCACCGGCGTGTCGGTGGCCGC 

CGCTGTCGCCCTGATGGACCTGGGTAGGGGCTTCCACGAGATGGCCGGAAACCCGCACACCAC 

GTGGGTGCTGTACGCCGTAATTGTGGTCTCCGCACTGGTCATCGTGGGCGCGATACCGGTGCT 

GTTGCGAGCTCGCCGCATGGCTGAGGCCGAGCCCGCGACGAGGCCGACGGGTGCATCCGTGC 

GGGGCGGGCGATCGATCGGATCCGGGCATCCGGCGAAACGCGCTGTGGCCGAGTCGGCACCC 

GTACAGCACGCGGATGCATTCGAGGTGGCCGCCGAGTGGTCCAGTGAGGCGGTGGACCGGAT 

CTGGTTGCGCGGGACAGTCGTGTTGACCAGTGCGATTGGCATTGCGTTGATTGCCGTGGCGGC 

GGCGACCTACCTCATGGCGGTCGGTCACGACGGGCCATCTTGGATCAGCTACGGGTTGGCCGG 

GGTGGTCACCGCGGGCATGCCGGTGATCGAGTGGCTATACGCTCGGCAGCTGCGCCGGGTGG 

TGGCGCCCCAGTCCAGTTAG 

>Rv2198c - TB.seq 2462149:2463045 MW:30955 

>emb|AL123456|MTBH37RV:c2463045-2462146. mmpS3 SEQ ID NO:88 

ATGAGCGGGCCGAATCCCCCGGGACGGGAACCTGACGAACCCGAATCGGAACCCGTCAGCGA 

CACGGGCGACGAACGGGCTTCCGGCAACCACTTGCCGCCCGTCGCCGGGGGCGGCGACAAAC 

TGCCCAGTGACCAGACGGGCGAGACCGACGCATATTCTCGGGCATACTCTGCCCCGGAATCCG 

AGCACGTCACCGGCGGCCCGTATGTGCCAGCCGATCTCAGGCTCTATGACTACGACGACTATG 

AGGAGTCGTCCGACCTGGACGACGAACTGGCCGCTCCGCGCTGGCCGTGGGTGGTCGGTGTC 

GCCGCCATAATTGCCGCCGTTGCGCTCGTGGTTTCGGTGTCGTTGCTCGTCACGCGACCACATA 

CCAGCAAACTCGCCACCGGCGACACTACGTCCTCTGCACCGCCCGTGCAGGACGAAATCACGA 

CCACCAAGCCGGCGCCGCCACCGCCGCCACCAGCCCCACCGCCCACCACCGAGATCCCGACA 
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GCGACGGAGACACAGACGGTCACTGTGACGCCGCCACCACCGCXJCCCACCGGCGACAACCAC 

GGCGCCGCCGCCGGCGACCACCACAACGGCGGCGGCACCGCCGCCCACGACCACCACGCCG 

ACCGGTCCGCGGCAAGTCACCTATTCGGTGACCGGTACCAAGGCGCCGGGTGACATTATCTCG 

GTGACTTACGTCGATGCCGCCGGGCGCCGACGGACACAGCACAATGTGTACATCCCGTGGTCC 

ATGACGGTCACCCCGATCTCGCAATCCGACGTTGGCTCGGTGGAGGCCTCCAGCCTTTTCCGG 

GTCAGCAAACTCAACTGCTCGATCACCACGAGCGACGGAACGGTGCTCTCATCGAACTCCAACG 

ATGGACCGCAAACGAGCTGCTGA 

>Rv2199c - TB.seq 2463234:2463650 MW:14866 

>emb|AL123456|MTBH37RV:c2463650-2463231. Rv2199c SEQ ID NO:89 

ATGCATATCGAAGCCCGACTGTTTGAGTTTGTCGCCGCGTTCTTCGTGGTGACGGCGGTGCTGT 

ACGGCGTGTTGACCTCGATGTTCGCCACCGGTGGTGTCGAGTGGGCTGGCACCACTGCGCTGG 

CGCTTACCGGCGGCATGGCGTTGATCX3TCGCCACCTTCTTCCX3GTTTGTGGCCCGCCGGTTAG 

ATTCCCGGCCCGAGGACTACGAAGGCGCTGAAATCAGCGACGGCGCAGGAGAACTTGGATTCT 

TCAGTCCGCATAGCTGGTGGCCGATCATGGTCGCGTTGTCCGGCTCGGTGGCAGCGGTCGGCA 

TCGCGTTGTGGCTCCCGTGGCTGATCGCCGCCGGTGTGGCATTCATCCTCGCCTCGGCGGCCG 

GATtGGTCTTCGAATATTACGTCGGTCCTGAGAAGCACTGA 



>Rv2200c ctaC TB.seq 2463661:2464749 MW:40449 
>emblAL123456|MTBH37RV:c2464749-2463658. ctaC SEQ ID NO:90 

GTGACACCTCGCGGGCCAGGTCGTTTGCAACGCTTGTCGCAGTGCAGGCCTCAGCGCGGCTCC 

GGAGGGCCTGCCCGTGGTCTTCGACAGCTGGCGCTCGCAGCAATGCTGGGGGCATTGGCCGT 

CACCGTCAGTGGATGCAGCTGGTCGGAAGCCCTGGGCATCGGTTGGCCGGAGGGCATTACCC 

CGGAGGCACACCTCAATCGAGAACTGTGGATCGGGGCGGTGATCGCCTCCCTGGCGGTTGGG 

GTAATCGTGTGGGGTCTCATCTTCTGGTCCGCGGTATTTCACCGGAAGAAGAACACCGACACTG 

AGTTGCCCCGCCAGTTCGGCTACAACATGCCGCTAGAGCTGGTTCTCACCGTCATACCGTTCCT 

CATCATCTCGGTGCTGTTTTATTTCACCGTCGTGGTGCAGGAGAAGATGCTGCAGATAGCCAAG 

GATCCCGAGGTCGTGATTGATATCACGTCTTTCCAGTGGAATTGGAAGTTTGGCTATCAAAGGGT 

GAACTTCAAAGACGGCACACTGACCTATGATGGTGCCGATCCGGAGCGCAAGCGCGCCATGGT 

TTCCAAGCCAGAGGGCAAGGACAAGTACGGCGAAGAGCTGGTCGGGCCGGTGCGCGGGCTCA 

ACACCGAGGACCGGACCTACCTGAATTTCGACAAGGTCGAGACGTTGGGCACCAGCACCGAAA 

TTCCGGTGCTGGTGCTGCCGTCCGGCAAGCGTATCGAATTCCAAATGGCCTCAGCCGATGTGAT 

ACACGCATTCTGGGTGCCGGAGTTCTTGTTCAAGCGTGACGTGATGCCTAACCCGGTGGCAAAC 

AACTCGGTCAACGTCTTCCAGATCGAAGAAATCACCAAGACCGGAGCATTCGTGGGCCACTGCG 

CCGAGATGTGTGGCACGTATCACTCGATGATGAACTTGGAGGTCCGCGTCGTGACCCCCAACG 

ATTTCAAGGCCTACCTGCAGCAACGCATCGACGGGAAGACAAACGCCGAGGCCCTGCGGGCGA 

TCAACCAGCCGCCCCTTGCGGTGACCACCCACCCGTTTGATACTCGCCGCGGTGAATTGGCCC 

CGCAGCCCGTAGGTTAG 
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>Rv2427c proA g-glutamyl phosphate reductase TB.seq 2724231 :2725475 MW:43746 
>emb|AL123456|MTBH37RV:c2725475-2724228. proA SEQ ID NO:91 

ATGACCGTGCCAGCACCGTCGCAGCTCGACTTGCGTCAAGAGGTGCACGACGCCGCACGCCG 

CGCCCGGGTGGCCGCCCGCCGGCTGGCATCGCTGCCGACGACTGTCAAAGACCGCGCGCTGC 

ACGCGGCTGCCGACGAGCTACTGGCTCACCGCGACCAGATCCTGGCGGCCAACGCCGAAGAC 

CTGAACGCGGCGCGCGAGGCGGACACCCCGGCCGCCATGCTGGACCGGTTGTCCTTGAACCC 

GCAACGAGTCGACGGTATCGCCGCCGGGTTGCGGCAAGTCGCGGGACTGCGCGATCCGGTCG 

GTGAAGTGCTGCGTGGCTATACCCTGCCCAACGGGCTGCAGCTGCGCCAGCAGCGCGTCCCCC 

TGGGCGTGGTCGGCATGATCTACGAGGGCCGCCCCAATGTCACCGTGGATGCCTTCGGGCTGA 

CACTCAAGTCGGGTAACGCTGCATTGCTGCGCGGCAGCTCGTCGGCCGCAAAGTCCAACGAGG 

CCCTGGTGGCGGTGTTACGCAGCGCGCTGGTCGGCCTGGAGCTGCCGGCCGACGCGGTCCAG 

CTGCTGTCGGCTGCCGACCGCGCCACCGTCACTCACCTGATTCAGGCCCGCGGCCTGGTCGAT 

GTGGTGATTCCACGCGGGGGAGCGGGCCTGATCGAGGCGGTCGTACGCGATGCCCAGGTGCC 

CACCATCGAGACCGGCGTCGGGAACTGCCATGTCTACGTGCACCAAGCGGCXSGACCTGGACGT 

GGCCGAGCGTATCTTGCTGAACTCCAAGACGCGGCGGCCCAGCGTCTGCAACGCCGCCGAGA 

CGCTGCTGGTCGACGCAGCGATCGCCGAAACGGCGTTGCCTCGATTGCTGGCCGCCCTGCAGC 

ACGCCGGTGTCACCGTACATGTCGACCCGGACGAGGCCGACCTGCGCCGCGAATACCTGTCGC 

TGGACATCGCGGTGGCGGTGGTCGACX3GTGTCGACGCTGCCATCGCCCATATCAACGAATACG 

GCACCGGGCACACAGAAGCGATTGTGACCACCAATCTTGATGCGGCCCAACGCTTTACCGAACA 

GATCGATGCGGCCGCGGTGATGGTGAACGCATCAACGGCGTTCACCGACGGCGAGCAATTCGG 

CTTCGGCGCCGAGATCGGCATCTCCACCCAGAAACTGCATGCCCGCGGACCGATGGGACTACC 

GGAATTGACGTCGACCAAGTGGATCGCATGGGGAGCCGGCCACACCCGTCCGGCCTGA 

>Rv2438c . similar to YHN4_YEAST P38795 TB.seq 2734793:2737006 MW:80492 
>emb|AL123456|MTBH37RV:c2737006-2734790. Rv2438c SEQ ID NO:92 

ATGGGACTGCTCGGCGGCCAATCAGGGCCCAGGGTCGGCAGCGGCCCAGTCGGTAGCATCCC 

CACGCCGGTCAATGCCGCCATCTGCCAGCAGCGCGGGGGATTCCACGGTGTCGAGCGTGGAT 

ACTCGGCGGGTGATTCGGGCGTTCTGACGTCGCTGGGCGACAATGAAAGGACGATGAACTTTT 

ACTCCGCCTACCAGCACGGGTTCGTGCGCGTTGCCGCCTGCACTCACCACACCACCATCGGTG 

ACCCGGCGGCCAACGCCGCGTCGGTATTGGACATGGCCCGTGCGTGCCACGACGATGGCGCA 

GCGTTGGCGGTCTTTCCTGAGCTGACGCTGTCGGGCTACTCCATCGAGGACGTACTACTGCAG 

GACTCTCTGCTCGATGCCGTCGAGGACGCGCTGCTCGACCTGGTGACCGAATCCGCCGACCTG 

TTACCTGTACTGGTGGTCGGGGCTCCGCTGCGGCATCGACACCGCATCTACAACACCGCGGTC 

GTCATTCACCGCGGCGCCGTGCTCGGCGTGGTGCCCAAGTCGTATCTACCCACCTATCGCGAG 

TTCTACGAGCGGCGCCAGATGGCGCCCGGAGACGGGGAGCGGGGCACGATCCGCATCGGTGG 

CGCCGACGTGGCCTTCGGCACGGACCTGTTGTTCGCCGCGTCAGATCTACCCGGCTTTGTGTT 
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GCATGTGGAGATCTGCGAGGACATGTTTGTGCCGATGCCGCCCAGCGCCGAGGCGGCCCTGG 

CGGGCGCGACGGTGCTGGCGAATCTGTCCGGCAGCCCGATCACCATCGGCCGTGCCGAGGAC 

CGCCGGCTGCTTGCGCGCTCGGCGTCGGCGCGGTGTCTGGCTGCCTATGTCTATGCCGCCGC 

GGGGGAGGGGGAGTCAACGACGGACCTGGCCTGGGACGGTCAGACGATGATCTGGGAGAATG 

GCGCACTGCTCGCX3GAGTCCGAACGTTTCCCCAAAGGAGTGCGCCGCAGTGTCGCCGACGTTG 

ACACCGAGTTGCTTCGGTCGGAGCGGCTGCGGATGGGCACGTTCGACGACAACCGGCGTCAC 

CACCGGGAGTTAACGGAATCGTTCCGGCGCATCGACTTCGCACTCGACCCACCGGCAGGCGAC 

ATCGGACTGCTGCGCGAGGTCGAGCGGTTCCCGTTCGTTCCGGCCGATCCGCAACGATTGCAA 

CAGGATTGCTACGAGGCCTACAACATCCAGGTGTCTGGACTCGAGCAACGGTTGCGGGCGCTG 

GACTATCCGAAGGTCGTTATCGGTGTGTCCGGGGGATTGGACTCGACGCACGCGCTGATCGTC 

GCGACCCATGCCATGGACCGCGAGGGCCGGCXGCGCAGCGACATTCTGGCGTTTGCGTTGCC 

CGGATTCGCCACCGGGGAGCACACTAAGAACAACGCGATCAAGCTGGCACGTGCGCTGGGGG 

TTACCTTCTCCGAAATCGATATCGGCGACACCGCTCGGTTGATGCTGCACACAATCGGCCATCC 

GTATTCGGTTGGCGAAAAAGTGTACGACGTCACCTTCGAGAACGTCCAGGCCGGGTTGCGCAC 

CGACTATCTTTTCCGTATCGCCAACCAGCGCGGGGGAATCGTACTGGGCACCXSGGGACCTGTC 

GGAGCTGGCACTGGGTTGGTCGACATACGGTGTCGGCGACCAGATGTCGCACTACAACGTCAA 

CGCCGGTGTGCCCAAGACGCTGATCCAGCACCTGATCCGGTGGGTCATTTCGGCGGGTGAGTT 

CGGTGAGAAGGTGGGTGAGGTATTGCAGTCGGTGCTCGACACCGAGATCACCCCCGAACTCAT 

TCCGACCGGCGAGGAGGAGCTGCAGAGCAGCGAGGCCAAGGTCGGACCTTTCGCCCTACAGG 

ACTTTTCGCTTTTTCAGGTACTGCGCTACGGATTTCGCCCGTCGAAGATTGCGI I I I IGGCCTGG 

CATGCGTGGAACGATGCGGAGCGGGGCAACTGGCCGCCCGGCTTCCCAAAGAGCGAACGCCC 

GTCCTATTCATTGGCCGAAATCCGGCATTGGCTGCAGATTTTCGTCCAGCGGTTTTATTCGTtTA 

GCCAGTTCAAGCGTTCGGCATTGCCCAACGGCCCCAAGGTGTCCCACGGGGGCGCGTTGTCGC 

CGCGTGGGGATTGGCGGGCCCCGTCGGATATGTCAGCGCGAATCTGGCTCGATCAGATCGACC 

GTGAGGTGCCCAAGGGCTAG 

>Rv2439c proB glutamate 5-kinase TB.seq 27371 1 8:2738245 MW:38789 
>emblAL123456|MTBH37RV:c2738245.2737115. proB SEQ ID NO:93 

ATGAGAAGTCCGCATCGGGACGCAATCCGGACCGCGCGCGGCCTTGTCGTGAAGGTCGGGAC 

CACGGCGCTTACCACACCGTCCGGGATGTTCGATGCCGGCCGGCTGGCCGGACTGGCCGAGG 

CGGTCGAGCGGCGGATGAAGGCGGGTTCCGACGTCGTCATCGTGTCTTCGGGCGCCATCGCC 

GCCGGCATCGAGCCGCTCGGGCTGTCCCGTCGTCCCAAAGATCTGGCGACCAAGCAGGCGGC 

GGCCAGCGTCGGGCAGGTCGCGCTGGTGAACTCGTGGAGCGCGGCGTTCGCCCGCTACGGCC 

GCACGGTGGGCCAGGTGCTGCTGACCGCGCACGACATTTCGATGCGGGTGCAGCACACCAAC 

GCCCAACGCACGCTGGATCGGCTGCGCGCGTTGCACGCGGTGGCGATTGTCAACGAGAACGA 

CACCGTGGCCACCAACGAGATCCGGTTCGGTGACAACGATCGGCTGTCTGCACTGGTGGCGCA 

CCTGGTCGGCGCCGACGCTTTGGTGCTGCTGTCGGACATCGACGGCCTCTACGACTGCGACCC 

108 



013S317A1 I > 



wo 01735317 



PCT/USOO/31152 



GCGCAAAACCGCGGACGCGACGTTCATTCCGGAGGTGTCCGGGCCGGCGGATCTGGACGGTG 

TGGTCGCCGGCCGCAGTAGCCACCTGGGTACTGGCGGCATGGCGTCCAAGGTGGCGGCGGCG 

CTGTTGGCCGCCGACGCCGGGGTGCCGGTACTGCTGGCCCCCGCGGCCGACGCCGCGACCG 

CGCTCGCCGACGCGTCGGTGGGCACGGTGTTTGCGGCCCGGCCCGCGCGTCTGTCGGCCCGG 

CGGTTCTGGGTGCGTTATGCCGCCGAAGCAACCGGCGCACTGACTCTCGACGCCGGTGCGGTG 

CGCGCTGTGGTGCGACAACGCCGGTCACTGCTGGCGGCGGGTATCACCGCGGTGTCCGGCCG 

GTTTTGCGGCGGCGATGTGGTCGAACTGCGTGCACCCGACGCGGCCATGGTAGCCCGCGGGG 

TGGTTGCCTACGACGCGTCCGAGCTGGCCACCATGGTGGGCCGGTCCACCTCTGAGCTACCCG 

GCGAGCTGCGCCGCCCGGTGGTGCACGCCGACGATCTGGTCGCGGTGTCGGCGAAGCAAGCT 

AAGCAAGTTTAG 

>Rv2440c obg Obg GTP-binding protein TB.seq 2738248:2739684 MW:50430 
>emb|AL123456|MTBH37RV:c2739684-2738245. obg SEQ ID NO:94 

GTGCCTCGGTTTGTCGATCGGGTCGTCATCCACACCAGAGCGGGTTCGGGCGGTAACGGCTGC 

GCTTCGGTCCATCGCGAGAAATTCAAGCCGCTGGGCGGCCCCGATGGCGGAAATGGCGGCCG 

GGGCGGCAGCATCGTCTTCGTCGTCGATCCGCAAGTGCACACCCTGCTCGACTTCCATTTCCGC 

CCGCATCTCACCGCGGCTTCGGGGAAGCACGGGATGGGCAATAACCGCGACGGGGCCGCCGG 

CGCGGATTTGGAAGTGAAAGTTCCCGAAGGCACCGTGGTATTGGACGAGAACGGCCGGCTACT 

GGCCGACCTGGTCGGCGCGGGCACCCGCTTTGAAGCCGCCGCCGGAGGCCGTGGCGGTTTGG 

GCAACGCCGCGCTGGCTTCCCGCGTGCGTAAGGCCCCCGGTTTCGCACTCCTCGGCGAAAAGG 

GACAGTCCGGAGACCTCACCTTGGAACTCAAGACCGTCGCCGACGTCGGCCTGGTCGGGTTTC 

CGTCGGCCGGAAAATCCTCGCTGGTGTCGGCGATTTCGGCGGCCAAGCCGAAGATCGCCGACT 

ACCCGTTCACCACCCTGGTGCCCAACCTCGGTGTGGTCTCGGCTGGCGAGCACGCGTTCACCG 

TCGCCGACGTGCCGGGGTTGATCCCGGGCGCATCCCGGGGCCGTGGTCTGGGGCTGGACTTT 

CTGCGGCACATCGAGCGCTGCGCTGTACTGGTGCATGTGGTGGATTGCGCTACCGCCGAGCCG 

GGCCGCGACCCCATCTCGGAGATCGACGCGCTGGAAACGGAACTCGCGTGCTACACGCCCAC 

GCTGCAAGGGGACGCGGCTCTGGGCGATCTCGCCGCACGGCCGCGTGCGGTGGTCCTCAACA 

AAATCGATGTGCCGGAGGCCCGCGAGCTCGCGGAGTTCGTCCGTGACGACATCGCCCAGCGC 

GGCTGGCCGGTGTTCTGCGTGTCGACCGCAACCCGGGAAAACCTGCAGCCGTTGATCTTTGGG 

CTGTCGCAGATGATCTCGGACTACAACGCTGCGCGGCCGGTGGCGGTGCCACGGCGGCCGGT 

GATTCGTCCGATTCCGGTGGACGACAGCGGTTTTACCGTCGAACCCGACGGGCATGGTGGCTT 

TGTCGTCAGCGGTGCCCGGCCCGAGCGTTGGATTGACCAGACCAACTTCGACAACGACGAGGC 

CGTCGGCTATCTCGCCGACCGGCTGGCGCGCCTGGGTGTCGAGGAGGAATTGCTGAGGCTGG 

GTGCGCGGTCAGGATGCGCGGTGACCATCGGCGAGATGACGTTGGATTGGGAGCCGCAAACG 

CCTGCGGGTGAGCCGGTCGCGATGTCCGGCCGGGGCACCGATCCGCGGCTGGACAGCAACAA 

GCGGGTGGGCGCGGCCGAGCGAAAGGCCGCTCGGAGTCGGCGTCGCGAACACGGGGATGGC 

TGA 
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>Rv2441c rpmA SOS ribosomal protein L27 TB.seq 2739773:2740030 MW:8969 
>emb|AL123456|MTBH37RV:c2740030-2739770. rpmA SEQ ID NO:95 

ATGGCACACAAGAAGGGGGCTTCCAGCTCGCGCAACGGTCGCGATTCCGCCGCCCAGCGGCT 
GGGGGTTAAGCGGTACGGCGGCCAGGTCGTCAAGGCCGGCGAGATCCTGGTCCGCCAGCGCG 
GTACCAAATTCCATCCXX3GCGTCAACGTCGGGCGTGGCGGCGATGACACCTTGTTCGCCAAGA 
CGGCCGGGGCGGTCGAGTTCGGCATCAAACGCGGACGTAAGACGGTGAGCATCGTCGGTTCG 

ACCACTGCCTGA 

>Rv2442c rplU SOS ribosomal protein L21 TB.seq 2740048:2740359 MW:1 1 1 S2 
>emb|AL1234S6|MTBH37RV:c2740359-2740045. rplU SEQ ID NO:96 

ATGATGGCGACCTACGCAATCGTCAAGACCGGCGGCAAGCAGTACAAAGTCGCTGTCGGAGAT 
GTGGTCAAGGTGGAAAAGCTGGAATCCGAGCAGGGGGAGAAGGTGTCCCTGCCGGTGGCTCT 
GGTTGTCGACGGCGCCACCGTCACCACCGATGCGAAGGCACTGGCCAAGGTCGCGGTGACCG 
GTGAGGTGCTCGGGCACACCAAGGGCCCCAAGATCCGTATCCACAAGTTCAAGAACAAGACTG 
GCTACCACAAACGGCAGGGACACCGTCAGCAGCTGACGGTCCTGAAGGTCACCGGCATCGCAT 

AA 

>Rv2448c valS valyl-tRNA synthase TB.seq 2747598:2750223 MW:97822 
>emb|AL1234S6|MTBH37RV:c2750223-2747593. valS SEQ ID NO:97 

ATGCTGCCCAAGTCGTGGGATCCGGCCGCGATGGAGAGCGCCATCTATCAGAAGTGGCTGGAC 

GCTGGCTACTTCACCGCGGACCCGACCAGCACCAAGCCGGCCTATTCGATCGTGCTGCCGCCG 

CCGAACGTGACCGGCAGCCTGCACATGGGCCACGCGCTGGAACACACCATGATGGACX3CCTTG 

ACGCGGCGCAAGCGGATGCAGGGCTATGAGGTGCTCTGGCAGCCGGGCACCGACCATGCCGG 

GATCGCCACCCAGAGCGTGGTCGAGCAGCAGCTGGCGGTCGACGGCAAGACTAAAGAAGACCT 

CGGCCGCGAGCTGTTCGTGGACAAGGTGTGGGATTGGAAGCGAGAGTCTGGCGGTGCCATCG 

GCGGCCAGATGCGCCGACTCGGTGACGGGGTGGACTGGAGGCGCGACCGGTTCACCATGGAC 

GAAGGTCTGTCGCGGGCGGTGCGCACGATCTTCAAGCGGCTTTATGACGCCGGGCTGATCTAT 

CGGGCCGAGCGGCTGGTCAACTGGTCGCCGGTGCTGCAGACCGCGATCTCCGACCTCGAGGT 

CAACTACCGGGACGTCGAAGGCGAGCTGGTGTCGTTTAGGTACGGCTCGCTTGACGACTCGCA 

ACCCCACATCGTGGTCGCCACCACCCGGGTCGAGACGATGCTGGGCGATACCGCGATCGCCGT 

CCATCCCGATGACGAGCGCTACCGTCACCTGGTCGGCACCAGCCTGGCGCACCCATTCGTCGA 

CCGGGAGCTGGCCATTGTCGCCGACGAGCACGTGGACCCTGAATTCGGCACCGGCGCGGTCA 

AAGTCACACCCGCCCACGACCCCAACGACTTCGAAATCGGGGTGCGCCACCAGCTGCCGATGC 

CCTCGATCCTGGACACCAAGGGCCGGATCGTCGACACCGGAACGCGATTCGACGGCATGGACC 

GCTTCGAGGCACGGGTCGCGGTGCGCCAAGCGCTCGCGGCCCAGGGCCGCGTGGTCGAAGAA 

AAGCGACCCTACCTGCACAGCGTCGGACACTCCGAACGCAGCGGCGAGCCGATCGAGCCGCG 
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GCTATCCCTGCAGTGGTGGGTCCGGGTGGAATCGCTGGCCAAAGCGGCCGGGGATGCGGTGC 
GCAACGGGGACACCGTGATTCACCCGGCCAGCATGGAACCCCGCTGGTTCTCCTGGGTCGACG 
ACATGCACGACTGGTGCATCTCGCGACAGCTCTGGTG6GGGCATCGGATCCCGATCTGGTACG 
GACCCGACGGCGAACAGGTGTGCGTCGGCCCGGACGAAACACCCCCGCAGGGCTGGGAACAG 

5 GATCCTGACGTGCTGGATACCTGGTTTTCGTCGGCGCTGTGGCCGTTTTCCACGCTGGGTTGGC 
CGGACAAGACGGCGGAGCTGGAAAAGTTCTATCCGACAAGCGTTCTGGTTACCGGCTATGACAT 
CTTGTTCTTTTGGGTGGCCAGAATGATGATGTTCGGCACCTTCGTCGGCGACGACGCCGCCATC 
ACCCTCGACGGCCGCCGGGGCCCGCAGGTGCCGTTGACCGACGTGTTTCTGCATGGGCTGATC 
CGCGACGAGTCTGGCCGCAAGATGAGCAAGTCCAAGGGCAACGTCATCGACCCGCTGGATTGG 

1 0 GTGG AAATGTTCGGGGCCGATGCGCTGCGGTTCACGCTGGCCCGCGGGGCCAGTCCCGGTGG 
TGACTTGGCGGTGAGCGAGGATGCCGTGCGGGCGTCGCGCAATTTCGGGACCAAGCTGTTCAA 
CGCCACTCGGTACGCACTGCTCAATGGCGCCGCGCCAGCACCCCTGCCATCGCCGAACGAGCT 
GACCGACGCCGACCGCTGGATTCTCGGAAGGTTGGAAGAGGTTCGGGCCGAAGTTGATTCGGC 
CTTCGACGGATACGAGTTCAGCCGCGCTTGTGAGTCCCTGTATCACTTCGCCTGGGACGAATTC 

1 5 TGCGACTGGTACCTCGAACTGGCCAAAACGCAGCTTGCCCAGGGACTCACACACACCACCGCC 
GTGCTGGCCGCCGGGCTGGACACGCTGCTGCGCCTGCTGCACCCGGTGATTCCCTTCCTCACC 
GAGGCGCTATGGCTGGCGCTGACCGGCAGGGAATCGCTGGTCAGCGCCGACTGGCCGGAGCC 
TTCCGGGATTAGCGTGGACCTTGTTGCCGCGCAACGGATTAACGATATGCAGAAGTTGGTGACC 
GAAGTGCGGCGGTTCCGCAGCGATCAAGGTCTGGCCGACCGGCAGAAGGTTCCGGCCCGAAT 

20 GCACGGTGTGCGGGACTCGGATCTGAGCAACCAGGTGGCCGCCGTGACCTCGCTGGCGTGGC 
TCACCGAGCCGGGCCCGGATTTTGAGCCGTCGGTCTCGTTGGAGGTTCGGCTCGGCCCCGAGA 
TGAACCGCACCGTCGTCGTCGAGCTCGACACCTCGGGCACCATCGACGTGGCCGCCGAGCGT 
CGCCGCCTGGAAAAGGAGTTGGCCGGCGCCCAAAAGGAGCTGGCGTCGACCGCCGCCAAGTT 
GGCCAACGCGGACTTTCTGGCCAAAGCGCCCGACGCCGTCATTGCCAAGATCCGGGACCGCCA 

25 GCGCGTGGCGCAGCAGGAAACCGAGCGCATCACCACCCGGTTGGCTGCGCTGCAATGA 

>Rv2482c plsB2 TB.seq 2786915:2789281 MW:88284 >emb|AL123456|MTBH37RV:c2789281- 
2786912. plsB2 SEQ 10 NO:98 

GTGACCAAACCGGCGGCCGATGCCAGCGCGGTGCTTACTGCCGAGGACACACTGGTGCTGGC 
30 TTCCACGGCGACGCCGGTCGAGATGGAGCTGATCATGGGCTGGCTGGGCCAGCAGCGTGCAC 
GCCATCCGGACTCGAAGTTCGACATATTGAAGCTGCCACCGCGCAACGCTCCGCCGGCGGCGC 
TGACGGCACTGGTCGAGCAGCTCGAGCCCGGCTTCGCATCCAGCCCGCAATCTGGCGAGGAC 
CGTTCTATCGTGCCGGTTCGGGTGATCTGGCTGCCTCCCGCCGATCGCAGCCGGGCGGGCAAG 
GTGGCCGCACTGCTCCCGGGTCGGGATCCCTACCATCCCAGCCAGCGTCAGCAGCGTCGCATC 
35 CTGCGTACCGATCCCAGGCGCGCGCGGGTGGTGGCCGGCGAGTCGGCCAAGGTGTCCGAAGT 
GCGCCAGCAGTGGCGCGATACCACGGTGGCAGAGCACAAGCGCGATTTCGCCCAGTTCGTCAG 
CCGCCGAGCGCTGTTGGCGCTGGCGCGCGCCGAATATCGGATCCTTGGACCGCAATACAAATC 

111 



BNSOOCIO: <VVO 0135317A1 I > 



wo 01/35317 



PCT/US00/3H52 



TCCCCGGCTGGTGAAGCCGGAGATGTTGGCGTCCGCACGATTTCGTGCCGGCCTGGACCGGAT 
TCCGGGCGCCACGGTCGAAGATGCCGGGAAGATGCTCGACGAACTCTCCACCGGATGGAGCC 
AGGTGTCGGTAGACCTGGTTTCCGTCCTCGGCAGGCTGGCTAGCCGCGGCTTCGATCCGGAAT 
TCGACTACGACGAGTATCAGGTCGCGGCGATGCGCGCCGCACTGGAGGCTCATCCGGCGGTC 
5 CTGCTGTTCTCGCACCGGTCCTACATCGACGGCGTGGTGGTACCGGTGGCCATGCAGGACAAC 
CGGTTACCGCCGGTGCACATGTTCGGCGGCATCAACCTGTCGTTCGGTCTCATGGGACCCCTC 
ATGCGGCGCTCGGGGATGATCTTCATCCGGCGCAATATCGGCAACGACCCACTGTATAAGTACG 
TGCTCAAGGAGTACGTGGGCTACGTGGTCGAGAAGCGGTTCAACCTGAGCTGGTCCATCGAAG 
GCACCCGGTCGCGCACCGGAAAGATGTTGCCGCCCAAGCTCGGTTTGATGAGCTACGTGGCCG 

10 ATGCTTACCTGGACGGCCGCAGTGACGACATCCTGCTGCAGGGGGTTTCGATTTGCTTCGATCA 
GCTGCACGAGATCACCGAATACGCCGCCTACGCGCGTGGCGCGGAGAAGACGCCCGAAGGTT 
TGCGCTGGCTCTACAACTTCATCAAGGCGCAGGGGGAACGCAACTTCGGCAAGATCTACGTTCG 
CTTCCCCGAAGCGGTCTCGATGCGCCAGTACCTCGGCGCACCGCACGGCGAGCTGACCCAGG 
ATCCGGCCGCGAAACGGCTTGCGTTGCAGAAGATGTCGTTCGAGGTGGCCTGGAGGATTTTGC 

1 5 AGGCGACGCCGGTGACCGCGACGGGTTTGGTGTCCGCACTGCTGCTCACCACCCGCGGCACC 
GCGTTGACGCTCGACCAGCTGCACCACACGTTGCAGGACTCACTGGACTATCTGGAACGCAAA 
CAATCGCCGGTTTCGACAAGCGCATTGCGACTGCGCTCGCGCGAAGGCGTCCGTGCGGCGGC 
GGACGCGTTGTCCAACGGCCACCCGGTCACTCGGGTCGACAGTGGCCGGGAGCCGGTATGGT 
ACATAGCGCCTGACGACGAGCACGCCGCGGCGTTCTACCGGAACTCGGTGATCCATGCG I I I M 

20 GGAGACCTCGATCGTCGAGCTCGCGCTGGCCCATGCCAAGCACGCCGAAGGTGACCGCGTCG 
CCGCGTTCTGGGCCCAGGCGATGCGGTTGCGGGATCTGCTGAAGTTCGACTTCTATTTCGCGG 
ATTCCACGGCGTTTCGGGCCAACATCGCCCAAGAGATGGCCTGGCACCAAGACTGGGAGGATC 
ATCTTGGCGTCGGGGGCAATGAGATCGACGCGATGCTGTATGCCAAACGGCCGCTGATGTCGG 
ACGCGATGTTGCGGGTCTTCTTCGAAGCCTATGAGATCGTTGCCGACGTGTTGCGCGATGCTCC 

25 GCCTGACATCGGTCCTGAGGAGTTGACGGAGCTGGCGCTCGGCCTCGGCCGTCAGTTTGTGGC 
ACAGGGCCGGGTCCGCAGCAGCGAACCGGTATCGACGCTGCTGTTCGCCACTGCACGCCAGG 
TCGCCGTCGATCAGGAGCTGATAGCGCCGGCGGCCGACCTCGCCGAACGTAGGGTCGCCTTG 
CGGCGGGAGTTAGGAAACATTCTGCGGGATTTCGACTATGTCGAGCAGATCGCGCGCAACCAG 
TTCGTCGCCTGCGAGTTCAAAGCGCGTCAAGGACGCGACCGAATCTAA 

30 

>Rv2509 - putative oxidoreductase TB.seq 2824676:2825479 MW:28014 
>emb|AL123456|MTBH37RV:2824676-2825482. Rv2509 SEQ ID NO:99 

ATGCCGATACCCGCGCCCAGCCCCGACGCACGTGCCGTTGTCACCGGGGCTTCGCAGAACATC 
GGCGCGGCGCTGGCCACCGAACTGGCCGCACGCGGGCACCACCTGATCGTCACCGCACGACG 
35 CGAGGACGTGTTGACCGAGTTGGCTGCCCGGCTGGCCGACAAGTACCGCGTCACGGTCGACG 
TGCGACCGGCCGATCTGGCCGATCCGCAAGAACGATCGAAACTGGCCGACGAGCTGGCTGCC 
CGGCCCATCTCGATCCTGTGCGCCAACGCGGGTACCGCGACATTCGGCCCGATCGCATCGCTC 
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GATCTTGCCGGCGAAAAGACGCAGGTGCAGTTGAATGCCGTGGCGGTGCACGACCTTACGTTG 

GCGGTGTTGCCGGGCATGATCGAGCGCAAGGCCGGCGGGATCTTGATTTCTGGTTCGGCGGCC 

GGCAATTCACCGATTCCGTACAACGCCACCTATGCCGCGACCAAGGCCTTCGTGAACACCTTCA 

GCGAATCTCTGCGCGGTGAGCTACGCGGCTCCGGCGTGCACGTCACGGTGCTGGCCCCGGGC 

CCGGTTCGCACCGAGCTACCGGATGCCTCCGAAGCGTCACTGGTCGAGAAGCTGGTGCCGGAC 

TTCCTGTGGATCTCGACGGAGCACACCGCCCGGGTATCGCTGAATGCCTTGGAGCGCAACAAG 

ATGCGCGTCGTTCCGGGTCTGAGGTCAAAGGCGATGTCGGTGGCCAGCCAATACGCTCCGCGC 

GCCATCGTGGCGCCAATCGTGGGTGCCTTTTACAAGAGGCTTGGGGGCAGCTAG 

>Rv2524c fas fatty acid synthase TB.seq 28401 24:2849330 MW:326226 
>emb|AL123456|MTBH37RV:c2849330-2840121, fas SEQ ID NO:100 

GTGACGATCCACGAGCACGACCGGGTGTCCGCTGATCGCGGCGGGGACAGCCCGCATACCAC 

CCACGCTCTGGTCGATCGCCTCATGGCTGGTGAGCCCTACGCTGTCGCATTCGGTGGCCAGGG 

CAGCGCCTGGCTGGAAACCCTCGAAGAGCTGGTGTCGGCCACGGGGATAGAAACCGAGTTGGC 

GACGTTGGTGGGTGAGGCAGAGCTGTTGCTCGATCCGGTCACCGACGAGCTGATTGTGGTGCG 

CCCGATCGGTTTCGAGCCGCTGCAATGGGTACGCGCACTGGCGGCCGAGGACCCGGTTCCGT 

CCGACAAGCACCTGACGTCGGCCGCCGTGTCGGTGCCCGGCGTGTTGCTTACCCAGATCGCGG 

CGACCCGGGCGCTGGCCCGTCAAGGCATGGACCTCGTGGCCACCCCGCCGGTCGCCATGGCG 

GGGCATTCGCAAGGTGTGCTGGCGGTGGAAGCCCTCAAGGCTGGTGGGGCACGCGACGTCGA 

GCTGTTTGCCTTGGCCCAGTTGATCGGTGCCGCCGGAACGCTGGTGGCCCGCCGGCGCGGAA 

TTTCCGTCCTGGGCGATCGCCCGCCGATGGTATCGGTCACCAACGCCGACCCCGAGCGCATCG 

GCCGGTTGCTCGACGAGTTCGCCCAGGACGTGCGCACGGTGCTGCCACCGGTGTTGTCCATCC 

GCAACGGCCGGCGTGCCGTCGTCATCACCGGCACCCCCGAGCAGCTGTCGCGTTTCGAGCTTT 

ATTGCCGCCAGATCTCCGAGAAGGAAGAAGCCGACCGGAAGAACAAGGTCCGCGGCGGCGAC 

GTCTTCTCGCCGGTCTTCGAGCCGGTGCAGGTGGAGGTGGGCTTTCACACCCCGCGGCTATCC 

GACGGGATCGACATCGTCGCGGGCTGGGCCGAGAAGGCGGGCCTCGATGTCGCCTTGGCTCG 

ggagctggccgatgccatcttgatcagaaaggtcgactgggtcgacgagatcacccgtgtcca 

cgcggccggcgcccgctggatcctcgacctggggccgggcgacatcctgacccgactgaccg 

gaccggtgatccgcggcctgggcatcggcatcgtgccggcggctacccgcggtggccagcgc 

aacctgttcaccgtcggcgccacccccgaggttgcccgggcctggtcgagctacgcaccgacc 

gtggttcgcctccccgacggcagggtcaagctctcgacgaagttcacccggctgaccggccgc 

tcgccgatcctgctcgcgggcatgaccccgaccaccgtggacgccaagatcgtcgccgcggc 

GGCCAACGCCGGGCACTGGGCCGAGCTGGCCGGCGGCGGGCAGGTCACCGAAGAGATCTTC 

ggtaaccgcatcgaacaaatggccggcctgctcgaggcgggccgcacctatcagttcaacgcg 
ctgttcctcgatccctacctgtggaagcttcaggtgggcggcaagcggttggtgcagaaggcc 
cgccagtccggcgccgcgatcgacggcgtggtgatcagcgccggcatcccagacctcgacga 
ggccgtcgagctgatcgacgaactgggcgacatcggcatcagccacgtcgtgttcaaacccgg 
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GACCATCGAGCAGATCCGCTCGGTGATTCGCATCGCCACCGAGGTGCCCACCAAGCCGGTGAT 

CATGCACGTCGAGGGCGGGCGCGCCGGCGGGCACCATTCCTGGGAGGATCTCGACGACCTGC 

TGCTGGCTACCTACTCGGAGTTGCGCTCACGCGCCAACATCACGGTGTGCGTCGGCGGCGGCA 

TTGGCACCCCGAGAAGGGCTGCGGAATATTTGTCCGGGCGCTGGGCGCAGGCCTACGGCTTCC 

CATTGATGCCGATCGACGGCATCCTGGTCGGCACCGCGGCGATGGCCACCAAGGAATCCACCA 

CGTCGCCATCGGTCAAGCGGATGCTCGTCGACACTCAGGGCACCGACCAATGGATCAGCGCCG 

GAAAAGCGCAGGGCGGCATGGCCTCCAGCCGCAGTCAGCTCGGTGCCGATATCCACGAGATC 

GACAACAGCGCATCCCGGTGCGGGCGGCTGCTCGACGAGGTGGCCGGTGACGCGGAGGCGG 

TCGCGGAGCGTCGCGACGAGATCATCGCGGCGATGGCCAAGACCGCCAAGCCCTACTTCGGC 

GACGTCGCCGACATGACCTACCTGCAGTGGCTGCGGCGCTACGTCGAACTGGCCATCGGGGAA 

GGCAACTCGACXGCCGACACCGCCTCGGTGGGCAGCCCGTGGCTGGCCGACACCTGGCGGGA 

CCGCTTCGAGCAGATGCTGCAGCGTGCCGAAGCCCGGTTGCACCCACAGGATTTCGGCCCGAT 

CCAGACGCTATTCACCGATGCTGGCCTGCTGGACAATCCGCAGCAGGCGATCGCCGCCCTGCT 

GGCGCGCTACCCCGACGCCGAGACCGTGCAGTTGCATCCCGCGGATGTGCCCi i i i iCGTGAC 

GTTGTGCAAGACGCTGGGCAAGCCGGTCAACTTCGTGCCGGTGATCGACCAGGACGTGCGGC 

GCTGGTGGCGCAGCGACTCGCTGTGGCAGGCCCACGACGCCCGCTACGACGCCGATGCGGTG 

TGCATCATTCCGGGCACCGCGTCGGTAGCCGGCATCACCCGGATGGATGAACCCGTCGGTGAG 

TTGCTGGACCGTTTCGAGCAAGCCGCAATCGATGAAGTGCTCGGCGCCGGTGTCGAGCCGAAG 

GATGTCGCGTCGCGCCGGCTGGGCCGCGCCGACGTGGCCGGACCGTTGGCTGTCGTCCTCGA 

CGCACCCGATGTGCGCTGGGCCGGTCGCACCGTGACCAACCCGGTGCATCGGATCGCCGACC 

CGGCCGAATGGCAGGTGCACGATGGACCCGAAAACCCGCGCGCCACACACTCATCCACCGGC 

GCCCGGCTGCAGACGCACGGCGACGACGTCGCCTTGAGCGTGCCCGTCTCGGGCACCTGGGT 

CGACATCCGATTCACGTTGCCGGCCAACACCGTCGATGGCGGCACCCCGGTGATCGCCACCGA 

GGACGCCACCAGCGCCATGCGCACGGTGCTGGCGATCGCCGCCGGTGTCGACAGCCCGGAGT 

TCTTGCCTGCGGTGGCCAACGGGACGGCCACTTTGACGGTGGACTGGCACCCCGAGCGTGTTG 

CCGACCACACCGGCGTCACCGCCACGTTCGGTGAGCCGCTGGCACCCAGCCTCACCAACGTG 

CCGGACGCGCTCGTCGGCCXJTTGTTGGCCAGCGGTTTTCGCGGCCATCGGATCGGCGGTCACC 

GACACCGGTGAGCCGGTGGTGGAAGGCCTGCTGAGCCTGGTGCATCTGGACCACGCCGCCCG 

CGTGGTCGGTCAGCTGCCCACGGTCCCGGCCCAATTGACCGTCACCGCAACGGCTGCCAACGC 

AACCGATACGGACATGGGCCGCGTCGTGCCGGTCTCGGTCGTCGTTACCGGCGCCGATGGCG 

CCGTGATCGCCACTCTCGAGGAGCGATTCGCGATCCTGGGTCGCACXDGGTTCCGCCGAGCTCG 

CCGACCCGGCGCGAGCCGGTGGCGCGGTGTCGGCGAACGCCACCGACAGCCCGCGCCGTCG 

CCGCCGCGACGTCACGATCACCGCGCCGGTCGACATGCGCCCGTTCGCGGTGGTGTCCGGCG 

ACCACAACCCCATTCACACCGACCGGGCCGCCGCGCTGCTTGCCGGCCTGGAGTCGCCGATC 

GTGCACGGCATGTGGCTGTCGGCCGCGGCGCAACACGCGGTGACCGCCACCGACGGGCAGG 

CCCGGCCACCGGCCXJGGCTGGTCGGCTGGACCGCGCGGTTTTTGGGCATGGTGCGCCCCGGC 

GACGAGGTGGACTTCCGCGTCGAGCGCGTCGGAATCGACCAGGGCGCAGAGATTGTGGACGT 
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GGCCGCGCGCGTCGGGTCGGATCTAGTGATGTCGGCCTCCGCGCGACTGGCCGCACCCAAGA 

CGGTCTACGCATTCCCCGGCCAGGGCATCCAACACAAGGGCATGGGCATGGAGGTGCGCGCC 

CGCTCCAAGGCGGCCCGCAAGGTGTGGGACACCGCGGACAAGTTCACCCGCGACACCCTGGG 

CTTCTCGGTACTGCACGTGGTCCGCGACAACCCGACCAGCATCATCGCCAGCGGTGTGCACTA 

CCACCACCCCGACGGGGTGCTCTACCTGACGCAGTTCACCCAGGTCGCGATGGCGACGGTGG 

CGGCCGCGCAGGTCGCCGAGATGCGTGAACAGGGAGCCTTCGTCGAAGGCGCCATCGCGTGC 

GGCCACTCGGTCGGCGAGTACACCGCGCTGGCCTGCGTGACCGGCATCTACCAACTGGAAGC 

CTTGCTGGAGATGGTGTTTCACCGCGGGTCGAAGATGCACGACATCGTTCCGCGCGACGAGCT 

CGGCCGCTCCAACTATCGGCTGGCGGCCATCCGGCCGTCCCAGATCGACCTCGACGACGCCG 

ACGTGCCCGCGTTCGTCGCCGGGATCGCGGAGAGCACCGGTGAATTCCTGGAGATCGTGAATT 

TCAACCTGCGTGGCTCGCAATACGCGATCGCGGGCACGGTACGCGGCCTCGAGGCGCTCGAG 

GCCGAGGTGGAGCGGCGCCGCGAGCTCACCGGCGGCCGACGGTCGTTCATTTTGGTGCCCGG 

CATCGATGTTCCGTTCCACTCGCGAGTGCTGCGGGTCGGGGTGGCCGAATTCCGGCGCTCGCT 

GGACCGGGTCATGCCGCGCGACGCGGACCCCGACCTGATCATCGGGCGCTACATTCCCAACCT 

GGTGCCGCGGTTGTTCACCCTGGACCGCGACTTCATCCAGGAAATCCGGGATTTGGTGCCCGC 

CGAGCCGCTCGACGAGATCCTCGCCGACTACGACACCTGGCTTCGCGAGCGTCCGCGCGAGAT 

GGCGCGCACGGTGTTCATCGAGCTGCTGGCATGGCAATTCGCCAGCCCGGTGCGCTGGATCGA 

GACGCAGGATCTGCTGTTCATCGAGGAGGCCGCCGGCGGGCTGGGTGTGGAGCGATTCGTCG 

AGATCGGTGTGAAGAGCTCACCGACG6TGGCGGGTCTTGCCACCAACACCCTCAAACTGCCCG 

AATACGCCCAGAGCACAGTGGAAGTGCTCAACGCCGAGCGTGATGCCGCGGTGCTGTTCGCCA 

CCGACACCGACCCGGAGCCGGAGCCGGAGGAAGACGAGCCGGTCGCGGAATCGCCCGCGCC 

GGACGTCGTCTCGGAAGCCGCCCCCGTCGCGCCGGCCGCTTCGTCGGCGGGCCCGCGTCCCG 

ACGATCTGGTTTTCGACGCCGCCGATGCCACGCTGGCGCTGATCGCGCTCTCGGCCAAGATGC 

GCATCGACCAGATCGAAGAAGTCGACTCCATCGAGTCCATCACCGACGGTGCGTCGTCGCGGC 

GCAACCAGCTGCTGGTGGACCTGGGCTCCGAGCTGAACCTCGGTGCCATTGACGGCGCCGCC 

GAATCGGACCTGGCCGGTCTGCGCTCACAGGTGACCAAACTGGCGCGCACCTACAAGCCTTAC 

GGCCCAGTGCTTTCCGACGCCATCAACGACCAGCTTCGCACCGTCCTCGGACCGTCGGGCAAG 

CGGCCGGGCGCCATCGCCGAGCGGGTGAAGAAGACCTGGGAGCTCGGTGAGGGCTGGGCCA 

AGCATGTCACCGTCGAGGTCGCGCTGGGCACCCGCGAGGGCAGCAGCGTTCGCGGCGGCGCC 

ATGGGCCACCTGCACGAGGGCGCGCTGGGCGATGCCGCCTCCGTCGACAAGGTCATCGACGC 

GGCGGTCGCATCGGTGGCCGCGCGCCAGGGCGTTTCGGTAGCGCTGCCGTCGGCCGGTAGTG 

GTGGCGGCGCCACCATCGACGCGGCCGCGCTCAGCGAGTTCACCGACCAAATCACCGGCCGT 

GAGGGCGTGCTGGCCTCCGCGGCCCGCCTGGTGCTGGGGCAGCTGGGACTGGACGACCCCGT 

CAACGCCTTGCCGGCCGCCCCCGATTCCGAGCTGATCGACTTGGTCACCGCCGAACTGGGAGC 

GGACTGGCCGCGGTTGGTGGCACCGGTGTTCGACCCCAAGAAGGCCGTCGTATTCGACGACG 

GCTGGGCCAGCGCCCGCGAGGACCTGGTGAAGCTGTGGCTGACCGACGAGGGCGACATCGAC 

GCCGACTGGCCGCGCCTGGCGGAGCGCTTCGAGGGTGCCGGCCACGTCGTGGCGACCCAGG 
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CTACCTGGTGGCAAGGTAAGTCGCTGGCCGCGGGCCGGCAGATCCATGCATCGCTGTACGGCC 

GCATCGCCGCCGGCGCCGAGAACCCCGAACCCGGCCGCTACGGCGGCGAAGTTGCCGTGGTG 

ACCGGCGCTTCGAAGGGTTCGATCGCCGCGTCGGTGGTGGCTCGGCTGCTCGACGGCGGAGC 

CACCGTCATCGCGACCACCTCCAAGCTCGAGGAGGAGCGGCTGGCGTTCTACCGCACGCTGTA 

TCGCGACCACGCCCGTTACGGCGCGGCGCTGTGGCTGGTCGCGGCGAACATGGCGTCCTACT 

CCGACGTCGACGCCCTGGTCGAATGGATCGGCACCGAACAGACCGAAAGCCTTGGGCCGCAGT 

CGATTCACATCAAAGACGCGCAGACCCCGACGCTGCTGTTCCCGTTCGCGGCGCCACGCGTGG 

TCGGGGACCTGTCGGAGGCCGGTTCGCGCGCCGAGATGGAGATGAAAGTGCTGCTGTGGGCC 

gtgcaacggctgatcggcggcctgtcgacgatcggcgccgaacgcgacatcgcgtcgcggct 

gcacgtggtgctgcccggctcgcccaaccgtggcatgttcggcggcgacggcgcx:tacggcg 

aagccaagtccgcgctggatgccgtggtgagccgctggcacgccgagtcgtcctgggcggca 

cgggtcagcctggcgcacgcgctcatcggctggacccgcggcaccgggctgatgggccacaa 

cgatgccatcgtgggcgccgtcgaagaggccggggtcaccacctactcgaccgaggagatgg 

cggcgctgctgctcgacctgtgtgatgcggaatccaaggtggctgcggcgcgttcgccgatca 

aggccgacctgaccgggggcctggccgaggccaacctcgacatggccgagctggcggccaag 

gcgcgcgagcagatgtcggcagcggcggccgtcgacgaggacgccgaggcccctggcgcca 

tcgccgcgctgccgtcgccgccccggggtttcacccccx3caccgccgccgcaatgggacgac 

ctcgatgtcgacccggccgacctggtggtgatcgtcggcggcgccgaaatcggcccgtacgg 

ctcgtcacgcaccgggttcgagatggaggtcgaaaacgagctgtcggcggccggcgtgctgg 

agctggcctggaccactgggttgatccgctgggaggacgacccgcaacccggttggtacgaca 

ccgaatccggcgaaatggtcgacgaatccgagttggtgcagcgctaccacgacgccgtggtgc 

agcgcgtcggcattcgcgaattcgttgatgacggcgcgatcgaccccgaccacgcctcgccgc 

TGCTGGTGTCGGTGTTCCTGGAGAAGGACTTCGCGTTCGTGGTGTCGTCGGAGGCCGATGCGC 

gcgccttcgtcgagttcgatgcggagcacacggtcatccggccggtgcccgactccaccgact 

GGCAGGTCATCCGCAAGGCCGGCACCGAGATCCGGGTGCCGCGAAAGACCAAGCTGTCCCGC 

gtcgtcggcggccagatcccgaccgggttcgacccgacggtgtggggcatcagcgcagacat 

ggccggttccatcgaccggttggcggtatggaacatggtggcgaccgtcgacgcgttcctgtc 

gtccggtttcagcccggccgaggtgatgcgttacgtgcacccgagtttggtggccaacaccca 

gggcagcggcatgggcggcggcacgtcgatgcagacgatgtaccacggcaatctgttgggcc 

gcaacaaggcgaacgacatcttccaggaagtcttgccgaatatcattgccgcgcacgtggttca 

gtcctacgtcggtagctacggtgcgatgatccacccggtagccgcgtgcgccaccgccgcggt 

gtcggtcgaggaaggtgtcgacaagatccggttgggcaaggctcaactggtggtggccggcg 

gcctggatgacctgacgctggagggcatcatcggattcggtgacatggccgccaccgccgaca 

cgtgcatgatgtgcggccgcggcatccacgactcgaagttttcccggcccaacgaccgccgcc 

gtctgggcttcgtcgaagcccaaggcggcgggacgatcctgttggcccggggggacctggcg 

ctgcggatggggctgcgggtggtggcggtggtggcgttcgcgcagtcgttcggcgacggcgt 

gcacacctcgatcccggccccgggcctgggcgcgctgggggggggccgcggcggcaaggat 
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TCACCGCTGGCGCGGGCGCTGGCCAAGCTGGGCGTGGCCGCCGACGACGTGGCGGTCATCTC 

CAAGCACGACACCTCGACGCTGGCCAACGATCCCAACGAGACCGAGTTGCATGAACGGGTCGC 

CGACGCCCTGGGCCGTTCCGAGGGCGCCCCGCTGTTCGTGGTGTCGCAGAAGAGCCTGACCG 

GCCACGCCAAGGGCGGCGCGGCGGTCTTCCAGATGATGGGGCTCTGCCAGATATTGCGGGAT 

GGGGTGATCCCACCCAACCGCAGCCTCGACTGCGTCGACGACGAGCTGGCCGGCTCCGCGCA 

TTTCGTGTGGGTGCGTGACACGTTGCGGCTCGGCGGCAAGTTCCCACTCAAGGCCGGCATGCT 

GACCAGCCTCGGGTTCGGCCATGTGTCGGGCCTGGTCGCGTTGGTGCATCCGCAGGCGTTCAT 

CGCCTCGCTGGATCCCGCACAGCGCGCGGACTACCAGCGGCGTGCCGACGCCCGCCTGCTGG 

CCGGTCAGCGCCGGCTGGCCTCGGCGATTGCCGGTGGTGCGCCGATGTACCAGCGGCCCGGT 

GACCGTCGCTTCGACCACCACGCGCCCGAGCGGCCGCAGGAGGCGTCGATGCTGCTGAATCC 

GGCGGCCCGGCTGGGTGACGGCGAGGCGTATATCGGCTGA 

>Rv2555c alaS alanyWRNA synthase TB.seq 2873772:2876483 MW:97326 
>emb|AL123456|MTBH37RV:c2876483-2873769, alaS SEQ ID NO:101 

GTGCAGACACACGAGATCAGGAAGCGGTTCCTCGATCATTTCGTGAAGGCGGGCCACACCGAG 

GTGCCCAGCGCCTCGGTGATCCTCGACGACCCCAACCTGTTGTTCGTCAACGCCGGGATGGTC 

CAGTTCGTGCCTTTCTTCTTGGGACAGCGCACGCCGCCGTACCCGACGGCCACCAGCATCCAG 

AAGTGCATCCGTACCCCCGATATCGACGAGGTGGGCATAACCACCCGGCACAACACGI I I I I IC 

AGATGGCCGGCAATTTCAGCTTCGGCGACTATTTCAAACGCGGGGCCATTGAACTGGCCTGGG 

CACTGCTGACCAACAGCCTCGCCGCCGGCGGCTACGGCCTGGACCCGGAAAGAATCTGGACG 

ACAGTCTATTTCGACGACGACGAAGCTGTCCGGCTATGGCAGGAGGTTGCCGGGCTGCCGGCG 

GAGCGAATCCAGCGCCGCGGCATGGCCGACAACTACTGGTCGATGGGCATTCCCGGACCGTG 

CGGGCCGTCATCGGAGATCTATTACGACCGCGGACCCGAATTCGGTCCCGCAGGCGGTCCCAT 

CGTCAGCGAAGACCGCTACCTCGAGGTCTGGAACCTGGTGTTCATGCAGAACGAGCGCGGAGA 

GGGAACCACCAAGGAGGACTACCAGATCCTCGGGCCGCTGCCCCGCAAGAACATCGACACCG 

GCATGGGCGTCGAGCGGATCGCGCTGGTGCTGCAAGACGTGCACAACGTCTACGAGACCGAC 

CTGCTCAGGCCGGTCATCGATACCGTGGCCAGGGTCGCCGCGCGTGCCTACGACGTCGGCAA 

CCACGAAGACGACGTGCGGTACCGCATCATGGCAGACCACAGCCGCACCGCCGCGATCCTGAT 

CGGTGACGGCGTCAGCCCCGGCAACGACGGTCGCGGTTATGTGCTGCGCCGGCTGCTGCGTC 

GGGTGATCCGCTCCGCCAAGCTGCTGGGCATCGACGCTGCGATCGTTGGCGACCTGATGGCCA 

CGGTGCGCAACGCGATGGGCCCGTCATATCCCGAACTCGTCGCCGACTTCGAGCGGATCAGGC 

GGATCGCGGTCGCCGAGGAGACGGCGTTCAACCGCACGCTGGCGTCGGGTTCCAGGCTGTTC 

GAGGAGGTGGCTAGCTCCACCAAGAAATCCGGAGCCACCGTGCTGTCCGGATCGGACGCTTTC 

ACGTTGCATGACACCTACGGGTTCCCGATCGAGCTCACGCTGGAGATGGCGGCCGAAACCGGT 

CTGCAGGTAGACGAAATCGGGTTCCGTGAGCTGATGGCCGAGCAGCGCCGCCGTGCCAAGGC 

CGACGCCGCCGCGCGCAAACACGCGCATGCTGACCTGAGCGCCTACCGCGAGCTGGTTGACG 

CCGGCGCCACCGAGTTCACCGGATTCGACGAGTTGCGTTCCCAGGCGCGGATTCTGGGCATCT 
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TCGTCGACGGTAAGCGGGTTCCGGTGGTGGCGCACGGTGTAGCCGGCGGAGCCGGGGAAGG 

GCAGCGTGTCGAACTTGTCTTAGATCGCACCCCGCTCTACGCCGAATCGGGTGGGCAGATCGC 

CGATGAGGGCACCATCAGCGGAACCGGTTCCAGCGAAGCTGCCCGGGCCGCGGTTACCGACG 

TGCAGAAGATCGCCAAAACGCTTTGGGTGCACCGAGTCAACGTGGAATCCGGGGAATTCGTCG 

AGGGTGACACCGTAATCGCGGCGGTGGATCCCGGGTGGCGCCGGGGTGCCACGCAGGGCCA 

CTCGGGCACCCACATGGTGCATGCCGCGCTGCGACAAGTGCTGGGGCCCAACGCGGTTCAGG 

CGGGATCGCTGAACCGGCCGGGATATTTGCGCTTCGACTTTAACTGGCAGGGTCCGTTGACCG 

ACGACCAGCGCACCCAGGTCGAAGAGGTCACCAACGAGGCCGTGCAAGCGGACTTCGAGGTG 

CGCACGTTCACCGAACAGCTCGACAAGGCCAAGGCGATGGGTGCCATCGCGCTGTTCGGCGAG 

AGCTACCCCGACGAAGTGCGGGTGGTGGAGATGGGTGGACCGTTCTCGCTGGAGCTATGTGGC 

GGCACCCATGTGAGCAACACX3GCGCAGATCGGTCCCGTGACGATCCTGGGCGAGTCGTCGATC 

GGCTCCGGGGTGCGCCGGGTGGAGGCCTACGTGGGGTTGGATTCGTTTCGTCACCTGGCCAA 

GGAGCGTGCGTTGATGGCCGGGTTGGCCTCGTCACTGAAGGTGCCGTCCGAAGAGGTACCGG 

CCCGGGTGGCCAATCTAGTGGAGCGCCTGCGGGCCGCCGAGAAGGAACTCGAACGTGTCCGG 

ATGGCCAGCGCCCGGGCAGCCGCCACCAATGCCGCCGCCGGGGCTCAGCGGATCGGTAACGT 

CCGTTTGGTGGCGCAGCGAATGTCCGGCGGGATGACCGCGGCAGACCTGCGGTCGTTGATCG 

GCGACATCCGCGGCAAGCTGGGTAGCGAGCCGGCGGTGGTGGCGCTGATTGCCGAGGGCGAA 

AGCCAAACTGTGCCGTATGCGGTCGCGGCCAATCCCGCTGCCCAGGACCTCGGAATCCGTGCC 

AACGACCTGGTCAAACAACTTGCGGTGGCGGTCGAAGGCCGCGGTGGCGGTAAGGCGGACCT 

GGCGCAGGGCTCGGGAAAGAATCCGACCGGTATCGACGCCGCGCTCGACGCGGTCCGCTCCG 

AGATCGCCGTGATAGCGCGGGTCGGTTGA 

>Rv2580c hisS histldyl-tRNA synthase TB.seq 2904822:2906090 MW:451 1 8 
>emb|AL123456|MTBH37RV:c2906090-2904819. hisS SEQ ID NO:102 

GTGACGGAATTCTCGTCATTTTCGGCCCCCAAGGGGGTACCGGACTACGTCCCGCCCGACTCG 

GGGCAGTTCGTCGCGGTGCGCGACGGGCTGCTCGCGGCGGCCCGTCAAGCXIGGCTATAGCCA 

CATCGAGCTGCCCATCTTCGAGGACACCGCCCTGTTCGCCCGGGGCGTGGGTGAATCCACCGA 

CGTGGTGTCCAAGGAGATGTATACGTTCGCCGACCGTGGCGACCGCTCGGTGACGCTGCGGCC 

C6AGGGCACCGCCGGGGTGGTGCGTGCGGTGATCGAACACGGGCTGGATCGCGGCGCGCTG 

CCGGTGAAGTTGTGTTATGCGGGCCCGTTTTTCCGCTACGAGCGTCCGCAGGCCGGCCGGTAT 

CGCCAGTTACAGCAAGTCGGGGTGGAGGCGATCGGCGTCGACGACCCGGCGTTGGACGCCGA 

GGTGATCGCCATTGCCGACGCCGGGTTCCGCTCGTTGGGTCTCGACGGGTTCCGGCTGGAAAT 

CACCTCX:CTGGGAGACGAGAGTTGCCGTCCGCAGTACCGGGAACTGTTGCAGGAGTTCTTGTTT 

GGACTCGATCTCGACGAGGACACCCGCAGGCGCGCAGGGATCAATCCGCTGCGGGTGCTCGA 

CGACAAGCGACCCGAATTGCGTGCGATGACGGCGTCGGCGCCGGTGTTGCTGGATCATCTGTC 

TGATGTCGCCAAGCAGCATTTCGACACCGTGCTCGCCCATCTGGACGCGCTTGGAGTGCCCTAT 

GTCATCAACCCGCGCATGGTGCGCGGCCTGGACTACTACACCAAGACCGCCTTCGAGTTCGTC 
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CATGACGGGCTTGGTGCGCAATCGGGGATCGGCGGCGGGGGGCGCTACGACGGCCTGATGCA 

CXAGCTTGGCGGGCAGGACTTGTCGGGCATCGGGTTCGGGCTGGGCGTGGACCGGACCGTGC 

TGGCGCTGCGGGCCGAGGGCAAGACGGCGGGGGACAGCGCCCGGTGCGACGTGTTCGGCGT 

GCCGCTTGGCGAGGCGGCCAAGCTCAGGCTGGCGGTGCTGGCTGGACGACTGCGCGCGGCC 

GGGGTGCGGGTTGACCTTGCCTATGGTGATCGCGGGCTCAAAGGCGCGATGCGCGCGGCCGC 

TCGTTCCGGCGCCCGTGTTGCGTTGGTAGCGGGCGACCGCGACATCGAGGCCGGGACGGTCG 

CAGTGAAGGACTTGACGACGGGTGAGCAAGTTTCGGTCTCGATGGATTCGGTTGTGGCCGAAG 

TAATTTCGCGGCTGGCTGGGTAG 

>Rv2614c thrS threony^tRNA synthase TB.seq 2941 190:2943265 MW:77123 
>emb|AL123456|MTBH37RV:c2943265-2941187. thrS SEQ ID NO:103 

ATGAGCGCCCCCGCACAACCCGCCCCGGGAGTCGATGGCGGCGACCCGTCGCAAGCCCGAAT 

TCGGGTTCCTGCCGGGACCACCGCGGCCACCGCCGTCGGCGAAGCGGGTTTACCGCGGCGCG 

GTACGCCCGATGCGATCGTCGTCGTGCGCGACGCCGACGGCAACCTGCGCGACCTGAGCTGG 

GTGCCCGACGTCGACACCGATATCACGCCGGTGGCCGCCAACACCGACGACGGTCGCAGCGT 

GATCCGCCATTCGACCGCGCACGTGTTGGCCCAAGCCGTCCAAGAGCTGTTTCCGCAGGCCAA 

GCTCGGCATCGGACCACCCATCACCGACGGCTTCTACTACGACTTCGACGTGCCCGAGCCGTT 

CACGCCCGAGGACTTGGCGGCGCTGGAAAAGCGGATGCGCCAGATCGTCAAGGAAGGCCAGC 

TGTTCGACCGGCGGGTCTAGGAATCCACCGAACAGGCCCGCGCCGAGCTGGCCAACGAGCCC 

TACAAGCTGGAACTCGTCGACGACAAATCGGGTGACGCCGAGATCATGGAGGTCGGCGGTGAC 

GAGCTCACCGCCTACGACAACCTCAACCCCCGCACCCGCGAGCGCGTCTGGGGCX3ACCTGTG 

CCGCGGACCGCACATCCCGACCACCAAACACATCCCGGCGTTCAAGCTCACCCGCAGCTCGGC 

CGCCTACTGGCGGGGCGATCAGAAAAACGCCAGCCTGCAACGGATCTACGGCACCGCGTGGG 

AATCCCAGGAGGCGCTCGACAGGCACCTGGAGTTCATCGAAGAGGCGCAGCGCCGCGACCAC 

CGCAAGCTGGGTGTCGAGCTGGACCTGTTCAGCTTCCCCGACGAAATCGGTTCCGGCCTAGCG 

GTTTTCCACCCCAAGGGCGGCATCGTGCGTCGCGAACTGGAGGACTACTCGCGGCGCAAGCAC 

ACCGAGGCGGGCTACCAGTTCGTCAACAGCCCGCACATCACCAAGGCCCAGTTGTTCCACACC 

TCGGGACATCTGGACTGGTACGCCGACGGCATGTTCCCCCCGATGCACATCGACGCGGAGTAC 

AACGCCGACGGCTCGCTGCGCAAACCCGGCCAGGACTACTACCTCAAGCCGATGAACTGCCCG 

ATGCACTGCCTGATCTTCCGCGCGCGCGGGCGATCCTATCGGGAACTGCCGTTGCGGCTCTTC 

GAGTTCGGCACGGTGTATCGCTACGAGAAGTCCGGTGTGGTGCACGGGTTGACCCGGGTGCGT 

GGGCTGACCATGGACGACGCGCACATCTTCTGCACCCGCGACCAGATGCGCGACGAGCTGCG 

GTCGCTGCTGCGGTTTGTGCTCGACCTGCTCGCCGACTACGGCCTCACCGACTTCTACCTCGAA 

CTGTCCACCAAGGACCCGGAGAAGTTCGTCGGCGCCGAGGAGGTCTGGGAGGAAGCCACCAC 

CGTGCTGGCCGAGGTGGGCGCCGAATCCGGGCTGGAGCTGGTGCCCGATCCAGGCGGCGCG 

GCGTTCTACGGGCCCAAGATTTCAGTGCAGGTCAAAGACGCGCTGGGCCGCACCTGGCAGATG 

TCGACCATCCAGCTGGACTTCAACTTTCCGGAACGTTTCGGCCTGGAGTACACCGCCGCCGACG 
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GAACCCGCCACCGCCCGGTGATGATCCACCGCGCGCTATTTGGGTCGATCGAGCGGTTCTTCG 

GCATTCTCACCGAGCACTACGCGGGGGCGTTCCCGGCCTGGTTGGCGCCCGTGCAGGTGGTC 

GGCATCCCGGTCGCCGATGAGCACGTCGCCTATCTGGAAGAGGTTGCCACGCAACTGAAGTCG 

CACGGGGTGCGGGCCGAGGTGGACGCCAGCGACGATCGGATGGCCAAGAAGATCGTGCACCA 

CACCAACCACAAGGTGCCGTTCATGGTGTTGGCGGGTGATCGTGACGTCGCCGCCGGCGCGGT 

GAGTTTCCGGTTCGGTGACCGCACCCAAATCAACGGTGTGGCCCGTGACGATGCGGTGGCGGC 

CATTGTCGCCTGGATCGCTGACCGCGAAAATGCGGTTCCTACAGCX3GAACTGGTGAAAGTGGC 

CGGTCGTGAGTGA 

>Rv2697c dut deoxyuridine triphosphatase TB.seq 3013683:3014144 MW:15772 
>emb|AL123456|MTBH37RV:c3014144-3013680. dut SEQ ID NO:104 

GTGTCGACCACTCTGGCGATCGTCCGCCTCGACCCCGGGCTCCX^GCTGCCCAGCCGCGCTCAC 

GACGGCGACGCCGGCGTTGATCTCTACAGCGCCGAAGACGTCGAGCTGGCACCTGGGCGCCG 

CGCCCTGGTACGGACGGGTGTTGCGGTCGCCGTCCCGTTCGGCATGGTCGGGCTGGTCCATC 

CGCGCTCCGGGTTGGCCACGCGGGTGGGGCTTTCGATCGTCAACAGTCCGGGCACCATCGAC 

GCGGGTTATCGTGGGGAGATCAAGGTGGCCCTGATCAACTTGGACCCAGCCGCGCCCATCGTG 

GTACATCGCGGTGACCGAATCGCCCAGTTGCTAGTGCAACGGGTTGAGTTGGTCGAGCTGGTC 

GAGGTCTCGTCGTTCGACGAGGCCGGGCTGGCCTCGACATCCCX3CGGCGACGGTGGCCACGG 

TTCCTCCGGCGGACATGCGAGTTTGTGA 

>Rv2782c pepR protease/peptidase, M16 family (insulinase) TB.seq 3089045:3090358 MW:47074 
>emb|AL123456|MTBH37RV:c3090358-3089042. pepR SEQ ID NO: 105 

ATGCCGCGACGGTCACCAGCTGACCCCGCGGCGGCGCTGGCGCCGCGGCGCACCACCCTGC 

CGGGCGGGCTGCGAGTGGTCACCGAATTCCTGCCCGCGGTGCACTCCGCGTCGGTCGGGGTG 

TGGGTCGGCGTCGGATCGCGCGACGAAGGCGCCACGGTGGCCGGGGCGGCGCACTTCCTTGA 

GCATTTGCTGTTCAAGTCGACGCCCACCCGCTCTGCCGTGGACATTGCGCAGGCGATGGACGC 

GGTGGGCGGGGAACTGAACGCATTCACCGCCAAGGAGCACACCTGCTACTACGCCCACGTGCT 

CGGCAGCGACTTGCCGTTGGCCGTCGACCTGGTCGCCGATGTGGTGCTCAACGGCCGCTGTGC 

CGCCGACGATGTCGAGGTGGAACGTGACGTCGTCCTCGAGGAGATCGCGATGCGCGACGACG 

ACCCCGAGGACGCCTTGGCGGACATGTTCCTGGCGGCGTTGTTCGGCGACCACCCGGTCGGTC 

GCCCGGTGATCGGCAGCGCGCAATCCGTGTCGGTGATGACGCGGGCTCAACTGCAATCGTTTC 

ACCTGCGGCGCTATACCCCGGAGCGGATGGTCGTCGCGGCCGCCGGCAATGTGGATCACGAC 

GGGCTGGTTGCGTTGGTCCGCGAGCACTTCGGGTCCCGGTTGGTCCGGGGGAGACGGCCAGT 

TGCGCCGCGCAAGGGTACCGGCCGGGTCAACGGCAGCCCCCGGTTGACACTGGTTAGCCGCG 

ACGCCGAACAGACGCATGTGTCGCTGGGCATGCGCACACCCGGGCGCGGCTGGGAGCATCGT 

TGGGCACTGTCGGTGCTGCACACCGCGCTGGGCGGTGGCTTGAGTTCCCGGCTGTTCCAGGAG 

GTCCGCGAGACCCGCGGGCTGGCCTACTCGGTCTACTCCGCGCTGGATCTCTTCGCCGACAGC 
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GGCGCGCTTTCGGTGTACGCGGCCTGCCTGCCCGAACGCTTCGCCGACGTGATGCGGGTGAC 

CGCCGATGTGCTGGAAAGCGTGGCACGCGACGGCATCACCGAGGCGGAATGCGGCATCGCCA 

AGGGATCGCTGCGGGGTGGGCTGGTGCTAGGGCTGGAGGATTCCAGCTCCCGGATGAGCCGG 

CTCGGCCGCAGCGAGTTGAACTACGGCAAGCACCGCAGCATCGAACACACCTTGCGGCAAATC 

GAGCAGGTCACCGTGGAGGAGGTCAACGCGGTGGCCCGCCACCTGCTGAGCAGGCGCTACGG 

TGCTGCGGTTCTTGGCCCACACGGATCGAAACGATCACTGCCGCAACAACTTCGAGCGATGGTA 

GGGTAG 

>Rv2783c gpsi pppGpp synthase and polyribonucleotide phosphorylase TB.seq 
3090339:3092594 MW:79736 >emb|AL123456|MTBH37RV:c3092594-3090336. gpsI 
SEQ ID NO:106 

ATGTCTGCCGCTGAAATTGACGAAGGCGTGTTCGAGACGACCGCCACCATCGACAACGGGAGC 

TTTGGCACCCGGACCATCCGCTTCGAGACCGGCCGATTGGCCTTGCAGGCCGCCGGCGCGGT 

GGTCGCCTACCTCGACGACGACAACATGCTGCTGTCGGCGACCACCGCCAGCAAGAACCCCAA 

AGAACACTTCGACTTCTTCCCCCTCACGGTCGACGTCGAGGAGCGCATGTATGCGGCCGGCCG 

CATCCCCGGTTCGTTCTTCCGTCGCGAGGGCCGACCCTCCACCGACGCGATCCTGACCTGCCG 

GCTCATCGACCGCCCGCTGCGCCCGTCGTTTGTCGACGGGCTGCGCAACGAGATCCAAATCGT 

GGTGACGATTCTCAGCCTGGATCCGGGCGATCTCTACGACGTATTGGCGATCAACGCGGCGTC 

GGCGTCCACCCAGCTGGGCGGTCTGCCGTTCTCCGGGCCCATCGGCGGTGTGCGGGTGGCGC 

TCATCGACGGCACCTGGGTCGGCTTCCCCACCGTCGACCAGATCGAGCGCGCCGTGTTCGACA 

TGGTCGTGGCCGGCCGGATCGTCGAGGGTGATGTTGCCATCATGATGGTCGAAGCCGAGGCCA 

CCGAAAACGTCGTCGAGCTCGTCGAAGGTGGTGCCCAAGCGCCGACGGAAAGCGTGGTGGCC 

GCGGGCCTGGAGGCGGCCAAGCCGTTTATCGCCGCGCTGTGCACCGCGCAGCAGGAGCTTGC 

CGATGCCGCTGGAAAGTCGGGCAAACCGACCGTCGACTTCCCGGTGTTCCCTGACTACGGCGA 

AGACGTGTACTACTCGGTGTCCTCGGTGGCCACCGACGAGTTGGCCGCCGCGTTGACCATCGG 

CGGTAAAGCCGAGCGCGACCAGCGCATCGACGAAATCAAGACCCAGGTTGTGCAGCGGCTCGC 

CGACACCTACGAGGGTCGCGAAAAGGAGGTCGGCGCCGCGTTGCGTGCCCTGACCAAAAAGCT 

GGTTCGGCAGCGCATCCTCACCGACCATTTCCGTATCGACGGCCGCGGCATCACCGACATTCG 

CGCATTGTCGGCCGAGGTGGCCGTGGTTCCGCGCGCGCACGGCAGCGCGCTGTTCGAACGCG 

GCGAAACCCAGATCCTGGGTGTGAGCACACTCGACATGATCAAGATGGCCCAGCAGATCGACT 

CGTTGGGGCCGGAGACATCGAAGCGGTACATGCACCACTACAACTTCCCGCCGTTCTCCACCG 

GCGAGACCGGTCGGGTCGGTTCGCCCAAGCGGCGTGAGATCGGGCACGGCGCACTGGCCGA 

GCGGGCCCTGGTGCCGGTGTTGCCGAGCGTCGAGGAATTCCCGTATGCCATTCGCCAGGTGTC 

GGAGGCTCTGGGCTCCAACGGGTCGACCTCGATGGGGTCGGTGTGCGCGTCGACGCTGGCGC 

TGCTGAACGCCGGGGTGCCGCTCAAGGCGCCGGTGGCCGGCATCGCGATGGGCCTGGTCTCC 

GACGACATTCAAGTAGAAGGGGCGGTCGACGGCGTTGTGGAGCGTCGCTTCGTCACCCTCACC 

GACATCCTCGGCGCCGAAGAGGCGTTCGGTGACATGGACTTCAAGGTCGCCGGGACCAAGGAC 
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TTCGTCACCGCGCTGCAGCTGGACACCAAGCTCGACGGGATCCCTTCGCAGGTGCTTGCCGGA 
GCACTCGAGCAGGCCAAGGACGCCCGCCTCACGATCTTGGAGGTGATGGCTGAGGCCATCGAT 
AGACCCGACGAAATGAGTCCCTACGCCCCGCGGGTGACCACCATCAAGGTTCCGGTGGACAAG 
ATCGGGGAGGTCATCGGACCCAAGGGCAAGGTCATCAACGCCATCACCGAGGAGACCGGCGC 
5 GCAGATCTCCATCGAAGACGACGGCACCGTGTTCGTCGGCGCCACCGACGGGCCATCGGCACA 
GGCCGCGATCGACAAGATCAACGCCATCGCCAACCCGCAGCTGCCGACGGTGGGCGAACGGT 
TCCTCGGAACCGTGGTCAAGACCACCXBATTTCGGTGCCTTTGTATCGTTGCTGCCTGGC^ 
CGGTCTGGTGCACATTTCCAAACTCGGCAAGGGCAAGCGCATCGCGAAGGTCGAGGACGTTGT 
CAATGTCGGTGACAAGCTGCGGGTGGAGATCGCCGACATCGACAAACGGGGCAAGATCTCCCT 
10 GATCCTGGTCGCCGACGAGGACAGCACCGCCGCCGCTACCGATGCCGCGACGGTCACCAGCT 
GA 

>Rv2793c truB tRNA pseudouridine 55 synthase TB.seq 3102364:3103257 MW:31821 
>emblAL123456|MTBH37RV:c3103257-3102361, truB SEQ ID NO:107 

15 ATGAGCGCAACCGGCCCCGGAATCGTGGTTATCGACAAGCCCGCGGGAATGACCAGCCATGAC 
GTGGTGGGGCGGTGCCGCCGCATCTTCGCCACCCGGCGGGTCGGCCACGCGGGCACCCTGG 
ACCCGATGGCCACCGGGGTGTTGGTGATCGGCATCGAACGCGCCACCAAGATCCTCGGTCTGC 
TGACGGCGGCCCCCAAGTCGTATGCCGCCACCATCCGCTTGGGTCAGACCACTTCCACCGAGG 
ACGCCGAAGGTCAAGTGCTGCAGTCGGTTCCGGCTAAGCACCTGACCATCGAGGCGATCGACG 

20 CCGCGATGGAGCGGCTGCGCGGTGAGATCCGGCAGGTGCCGTCGTCGGTCAGCGCGATCAAG 
GTCGGTGGCCGACGCGCCTATCGGTTGGCCCGCCAGGGGCGCTCCGTGCAATTGGAAGCCCG 
GCCGATCCGCATCGACCGGTTCGAGCTGCTGGCCGCACGCCGGCGCGACCAGCTCATCGATAT 
CGATGTGGAGATCGACTGCTCCTCGGGAACCTACATCCGCGCGTTGGCACGCGACCTCGGCGA 
CGCGCTTGGGGTGGGAGGCCATGTGACGGCGTTGCGGCGCACCCGCGTCGGCCGCTTCGAGC 

25 TGGACCAGGCGAGATCGCTCGACGATCTCGCGGAGCGCCCCGCGCTGAGCCTGAGCCTCGAT 
GAGGCCTGCCTGCTGATGTTTGCGCGCCGCGACCTGACCGCCGCGGAGGCCAGCGCGGCCGC 
CAACGGCCGGTCCCTGCCGGCGGTCGGTATCGACGGCGTGTACGCGGCCTGTGACGCCGACG 
GCCGGGTTATCGCGCTGCTGCGTGACGAGGGTTCGCGGACCAGGTCGGTGGCGGTGCTCCGG 
CCGGCGACGATGCACCCCGGGTAG 

30 

>Rv2797c - TB.seq 3105619:3107304 MW:58761 >emb|AL123456|MTBH37RV:c3107304-3105616. 
Rv2797c SEQIDNOrlOa 

GTGCCACTGACCGTGGCCGATATCGATCGGTGGAACGCGCAAGCGGTCCGGGAGGTGTTTCAC 
GCGGCCAGTGCCCGAGCGGAGGTGACGTTCGAGGCGTCGCGTCAGTTGGCCGCGCTGTCGAT 
35 TTTTGCGAACTCGGGTGGCAAGACCGCTGAGGCGGCGGCACACCACAACGCGGGCATTCGCC 
GAGACCTCGACGCCCACGGCAACGAGGCGTTGGCGGTTGCCCGGGCGGCCGACAGGGCCGC 
CGACGGGATTGTGAAGGTTCAGTCCGAGCTGGCCGCACTACGCCATGCCGCCGCGGCCGCCG 
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AGCTGACGATCGATGCGCTGATCAACCGGGTGGTGCCGATCXXCGGGCTGCGATCCACCGAG 

GCGCAGTGGGCGCGGACGCTGGCCAAGCAAACGGAGCTGCAGGCGGAGCTGGATGCGATTAT 

GGCCGAGGCCAATGCCGTCGACGAGGAGCTGGCCTCAGCGGTCAATATGGCCGACGGTGACG 

CGCCCATCCCGGCCGATTCCGGCCCGCCGGTCGGTCCCGAGGGGCTGACCCCGACCCAGCTC 

GCCAGCGATGCCAACGAGGAGCGGCTGCGCGAGGAGCGCGCCCGCCTGCAGGCCCACCTCG 

AGCGGTTACAGGCGGAGTATGACCAACTGAGTGTGCGGGCCGCCCGTGACTACCACAACGGCA 

TCCTCGACGGTGACGCGGTGGGCCGACTGGCAGCGCTTACCGACGAGCTGAGCGCXJGCCAGG 

GGCCGGCTGGGTGAGCTCGATGCCGTCGACGAGGCGTTGAGCCGAGCACCCGAGACCTACCT 

GACCCAGCTGCAGATTCCCGAGGACCCAAATCAGCAGGTGCTGGCGGCCGTGGCXiGTCGGTAA 

TCCCGACACCGCCGCCAATGTGTCGGTGACGGTTCCCGGCGTCGGGTCCACCACCCGGGGCG 

CCCTGCCCGGCATGGTGACCGAAGCCCGCGACCTGCGGTCGGAGGTAATCCGGCAACTCAATG 

CTGCCGGCAAGCCCGCATCGGTTGCCACCATCGCCTGGATGGGCTACCACCC6CCCCCGAACC 

CACTCGACACCGGCAGTGCGGGCGATCTGTGGCAGACCATGACCGATGGGCAGGCACACGCG 

GGCGCGGCCGATCTGTCGCGGTATTTGCAGCAGGTGCGCGCCAATAACCCCAGTGGCCACCTG 

ACCGTGTTGGGGCACTCGTATGGGTCGCTGACGGCGTCGCTGGCGTTGCAGGACCTCGATGCC 

CAGAGCGCCCATCCGGTCAACGACGTCGTGTTTTACGGCTCACCCGGCTTGGAGCTGTACAGC 

CCGGCGCAGCTCGGGCTCGATCACGGGCACGCTTATGTCATGCAGGCCCCCCACGACCTCATC 

ACCAATCTGGTGGCGCCGTTGGCGCCGCTGCACGGATGGGGCCTGGACCCCTATCTGACCCCC 

GGGTTCACGGAGCTGTCGTCACAGGCGGGTTTTGATCCGGGCGGGATCTGGCGTGACGGAGT 

GTATGCCCACGGGGACTACCCGCGGTCCTTCCTCGATGCCGCCGGCCAGCCGCAGCTGCGGA 

TGTCCGGCTATAACCTGGCGGCGATCGCCGCCGGGCTGCCCGACAACACGGTGGGCCCGCCG 

CTGCTTCCGCCAATTCTGGGTGGCGGCATGCCGGCAGCGCCCGGCCCAGCACTGAGAGGGGG 

ACGTTGA 

>Rv2864c ponA2 TB.seq 3175454:3177262 MW:63015 >emb| AL123456|MTBH37RV:c31 77262- 
3175451, Rv2864c SEQ ID NO:109 

ATGGTAACTAAAACAACATTAGCCTCAGCCACCTCAGGTTTGCTGCTGCTTGCGGTCGTCGCCAT 

GTCGGGCTGCACCCCGCGTCCCCAAGGGCCCGGTCCGGCGGCCGAAAAGTTCTTCGCCGCGC 

TGGCCATCGGTGACACCGCCTCCGCCGCCCAGCTCAGCGACAACCCCAACGAGGCGCGCGAA 

GCGCTGAACGCGGCCTGGGCGGGGCTGCAGGCCGCCCACCTGGATGCGCAGGTTCTCAGCGC 

GAAGTACGCCGAGGACACCGGTACGGTCGCTTATCGCTTCAGCTGGCATCTGCCCAAGGACCG 

AATCTGGACCTATGACGGCCAGCTGAAGATGGCCCGCGACGAAGGGCGTTGGCACGTTCGCTG 

GACCACCAGCGGGTTGCATCCCAAGCTAGGCGAACATCAAACGTTCGCGCTACGAGCCGACCC 

GCCGCGGCGCGCCTCGGTGAACGAAGTCGGCGGCACCGATGTGCTGGTGCCGGGCTATCTGT 

ATCACTACTCGCTGGACGCCGGCCAGGCCGGCCGCGAGCTCTTCGGCACGGCACACGCGGTG 

GTGGGCGCGCTGCACCCCTTCGACGACACGCTCAATGATCCGCAGCTGCTGGCCGAACAGGCC 

AGCTCGTCGACCCAGCCGTTGGACCTGGTCACGTTGCACGCCGACGACAGCAACCGGGTGGC 
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CGCGGCGATCGGGCAGCTGCCTGGCGTGGTGATCACACCGCAGGCCGAGCTGCTCCCGACCG 

ACAAGCACTTCGCGCCGGCGGTCCTCAACGATGTCAAGAAGGCCGTCGTCGATGAACTCGACG 

GCAAGGCGGGTTGGCGGGTGGTGAGCGTCAACCAAAATGGCGTCGACGTCTCGGTGCTGCAC 

GAGGTCGCCCCATCACCTGCGTCGTCGGTTTCGATCACGTTGGATCGGGTCGTGCAAAACGCC 

GCGCAACACGCGGTGAACACCCGGGGCGGCAAGGCGATGATCGTCGTGATCAAGCCGTCGAC 

CGGCGAGATCCTGGCGATCGCGCAGAACGCCGGGGCCGATGCGGACGGTCCGGTCGCGACCA 

CCGGTCTATATCCACCCGGGTCGACATTCAAGATGATCACCGCCGGTGCGGCCGTCGAGCGTG 

ACCTGGCTACCCCTGAGACGCTGCTGGGTTGCCCCGGGGAGATCGACATCGGGCATCGCACCA 

TTCCCAACTACGGTGGCTTTGATCTGGGCGTGGTGCCGATGTCACGCGCGTTTGCCAGTTCCTG 

CAACACCACCTTCGCCGAGCTGAGCAGCAGGCTGCCTCCCCGCGGTCTGACTCAGGCGGCCC 

GGCGGTACGGGATCGGGCTTGACTACCAGGTGGACGGCATCACCACGGTGACCGGTTCGGTG 

CCGCCGACGGTGGACCTGGCCGAACGCACCGAGGACGGTTTCGGCCAGGGCAAGGTGCTGGC 

CAGCCCGTTCGGCATGGCCTTGGTGGCGGCGACGGTAGCCGCCGGGAAGACCCXIGGTTCCAC 

AGCTGATCGCCGGCCGGCCGACGGCCGTCGAAGGCGATGCCACACCGATCAGCCAGAAGATG 

ATCGACGCGCTGCGGCCCATGATGCGGTTGGTGGTGACCAATGGCACCGCCAAGGAGATCGCT 

GGCTGTGGCGAGGTGTTCGGTAAGACCGGCGAAGCCGAATTCCCGGGCGGATCGCATTCCTG 

GTTCGCCGGGTACCGTGGCGATCTGGCATTTGCGTCGCTGATCGTCGGGGGCGGTAGCTCGGA 

ATACGCGGTGCGGATGACCAAGGTGATGTTCGAATCGCTGCCGCCGGGGTACCTGGCGTAG 

>Rv2868c gcpE TB.seq 3179368:3180528 MW:40451 >emb|AL123456|MTBH37RV:c31 80528- 
3179365. gcpE SEQ ID NO:1 10 

gtgactgtaggcttgggcatgccgcagcccccggcacccacgctcgctccccggcgcgccac 

ccgtcagctgatggtcggcaacgtcggcgtgggcagtgaccatccggtctcggtgcaatcgat 

gtgcaccaccaaaacccacgacgtcaactcgacattgcaacaaatcgccgagctgaccgcggc 

cggatgcgacatcgtgcgggtggcctgcccgcgccaggaggacgccgacgcgctggccgag 

atcgcccggcacagccagatcccggtagtcgcggacatacatttccagcx:gcgctacatattcg 

ccgccatcgacgctggatgtgccgcggtgcgggtcaacccgggcaacatcaaggagtttgacg 

gccgggtgggtgaggtcgccaaggcggcgggtgcggccgggatcccgatcxjgaatcggtgt 

caacgccggttcgctggacaaacggttcatggagaagtatggcaaagccacgcccgaggcgct 

ggttgagtcggcgctgtgggaggcttcgcttttcgaggagcatggcttcggtgacatcaagat 

cagcgtcaagcacaacgacccggtggtgatggtcgccgcctacgagctgcttgctgcacggtg 

cgactacccactgcacctcggtgtcaccgaggccggccctgctttccagggcaccatcaagtc 

cgcggttgccttcggcgcgttgctgtcgcggggcataggcgacaccatccgggtgtcgttgtc 

ggccccgccggtcgaggaagtcaaggtgggcaatcaggttctcgagtcgttgaacctgcggcc 

gcgttcgctcgagatcgtgtcttgcccgtcgtgcggtcgcgcgcaagtcgacgtctacaccct 

ggccaacgaggtaaccgccggcctggatggtctcgatgtgccgttgcgggtggccgtgatgg 

ggtgtgtcgtcaatggtccgggtgaagcacgtgaggccgacctgggcgtggcgtccggcaac 
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GGCAAAGGTCAGATCTTTGTACGGGGCGAAGTGATCAAGACCGTGCCXGAAGCACAGATCGTC 

GAGACGCTGATCGAGGAGGCGATGCGGCTGGCCGCCGAAATGGGCGAGCAAGATCCGGGCGC 

GACACCGAGCGGTTCGCCTATTGTGACCGTAAGCTGA 

>Rv2869c - TB.seq 3180548:3181759 MW:42835 >emb|AL123456|MTBH37RV:c3181759-3180545. 
Rv2869c SEQIDNO:111 

ATGATGTTTGTTACCGGCATTGTGCTGTTCGCGCTCGCGATCCTGATTTCGGTGGCCCTGCACG 

AATGTGGTCACATGTGGGTCGCGCGCCGCACCGGGATGAAGGTACGTCGCTATTTCGTCGGCT 

TTGGCCCCACGTTGTGGTCGACGCGGCGCGGCGAGACCGAATACGGTGTCAAAGCCGTTCCGC 

TGGGCGGCTTCTGTGACATCGCCGGCATGACCCCGGTCGAGGAACTCGACCCCGACGAACGTG 

ACCGTGCGATGTACAAGCAGGCCACCTGGAAGCGGGTCGCAGTGTTATTCGCCGGGCCCGGAA 

TGAACCTCGCTATCTGCCTGGTGCTGATCTATGCCATCGCGCTGGTCTGGGGGCTGCCTAACCT 

GCATCCGCCAACCAGGGCCGTAATCGGCGAAACTGGCTGCGTTGCACAGGAAGTGAGCCAGG 

GCAAGCTCGAGCAGTGCACCGGGCCCGGTCCGGCGGCGCTGGCCGGAATTCGCTCCGGTGAC 

GTCGTGGTCAAGGTCGGTGACACCCCGGTGTCCAGTTTCGACGAGATGGCCGCCGCGGTGCG 

CAAGTCACACGGCAGCGTCCCGATCGTTGTCGAGCGTGACGGCACCGCGATTGTTACX^TACGT 

GGACATCGAATCCACCCAACGCTGGATCCCTAACGGGCAGGGCGGTGAGCTCCAGCCGGCAAC 

GGTCGGTGCGATTGGGGTGGGCGCCGCCCGGGTCGGGCCTGTGCGCTACGGCGTGTTCTCCG 

CCATGCCGGCCACATTCGCGGTCACCGGCGACCTGACCGTGGAGGTGGGCAAGGCGCTGGCC 

GCCCTCCCGACCAAGGTAGGTGCGCTGGTGCGGGCGATCGGCGGCGGGCAGCGTGACCCGC 

AGACGCCGATAAGTGTGGTGGGCGCCAGCATCATCGGCGGCGACACCGTCGACCATGGGCTG 

TGGGTGGCGTTGTGGTTCTTCTTGGCCCAGCTGAACCTCATCCTGGCTGCGATCAACCTGCTGC 

CGTTGCTGCCGTTCGATGGCGGCCATATTGCCGTCGCGGTGTTCGAGAGGATCCGCAACATGG 

TCCGGTCGGCTCGTGGCAAGGTGGCGGCCGCACCGGTGAATTACCTCAAACTCTTGCCGGCGA 

CCTATGTGGTCTTGGTTCTTGTCGTCGGGTACATGCTCTTGACCGTCACCGCCGACCTGGTCAA 

CCCGATTAGGCTTTTCCAGTAG 

>Rv2870c - TB.seq 31 81 770:31 83077 MW:45324 >emb|AL1 23456|MTBH37RV:c31 83077-31 81 767. 
Rv2870c SEQIDNO:112 

GTGGCTACCGGTGGACGCGTCGTGATCCGGCGGCGCGGTGACAACGAGGTGGTGGCGCACAA 

TGATGAGGTGACCAACTCGACCGACGGGCGCGCTGACGGCCGGTTGCGGGTGGTGGTGCTGG 

GCAGTACCGGCTCGATCGGCACCCAGGCGCTTCAGGTCATCGCCGACAATCCGGAGCGTTTCG 

AGGTAGTCGGGCTGGCCGCTGGCGGCGCCCATCTGGACACGTTGCTGCGACAACGTGCGCAG 

ACGGGGGTGACCAATATTGCCGTCGCTGACGAGCACGCGGCGCAGCGGGTCGGCGACATCCC 

CTAGCACGGATCCGACGCCGCCACCCGGCTGGTCGAGCAGACCGAGGCCGACGTCGTCCTCA 

ATGCGCTGGTCGGCGCGTTGGGCCTGCGACCGACGTTGGCCGCGCTCAAGACGGGTGCCCGG 

CTGGCGCTGGCCAACAAGGAATCGCTGGTCGCCGGTGGTTCGCTGGTGCTGCGGGCGGCGCG 
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GCCCGGTCAGATCGTGCCGGTCGACTCCGAACACTCCGCGCTGGCCCAGTGCCTGCGCGGCG 

GCACTCCCGACGAGGTCGCCAAGCTGGTGCTGACGGCCTCGGGAGGGCCGTTTCGGGGCTGG 

TCCGCGGCCGACCTCGAGCATGTCACCCCCGAGCAGGCTGGCGCGCATCCTACX3TGGTCGATG 

GGCCCGATGAACACGCTGAATTCGGCGTCGCTGGTCAACAAGGGACTTGAGGTCATCGAAACC 

CACCTGCTGTTCGGCATCCCCTACGACCGCATCGATGTCGTGGTGCACCCCCAGTCGATCATCC 

ATTCGATGGTCACCTTCATCGACGGTTCGACGATCGCCCAGGCCAGTCCCCCGGACATGAAGCT 

ACCGATTTCGTTAGCGCTGGGCTGGCCGCGTCGGGTCAGCGGCGCCGCTGCTGCCTGTGATTT 

CCATACCGCGTCGAGCTGGGAGTTCGAGCCGTTGGACACCGACGTCTTCCCCGCGGTCGAGTT 

GGCCCGGCAGGCCGGCGTAGCCGGTGGCTGCATGACCGCGGTTTACAATGCGGCGAACGAAG 

AAGCAGCAGCGGCGTTCCTTGCTGGCCGGATCGGCTTCCXJGGCCATCGTCGGCATCATCGCCG 

ACGTGTTGCACGCTGCCGACCAATGGGCCGTCGAACCCGCTACCGTGGATGACGTACTCGACG 

CGCAGCGCTGGGCCCGCGAGCGAGCGCAGCGCGCGGTATCTGGTATGGCTTCGGTGGCGATC 

GCAAGCACGGCGAAGCCGGGCGCAGCGGGTCGACACGCATCGACGTTAGAAAGGTCCTGA 

>Rv2922c smc member of Smc1/Cut3/Cut14 family TB.seq 3234189:3238055 MW:139610 
>emb|AL123456|MTBH37RV:c3238055-3234186. smc SEQ ID NO:113 

GTGGGTGCAGGGAGTCGGTTTCCGCTGGTGGACCCGCTGCCGAGCGTTGGAGCTCGGCCTGA 

CCGGTTACGCGGCCAACCACGCCGACGGACGCGTGCTGGTGGTCGCCCAGGGTCCGCGCGCT 

GCGTGCCAGAAGCTGCTGCAGCTGCTGCAGGGCGACACGACACCGGGCCGCGTCGCCAAAGT 

CGTCGCCGACTGGTCGCAGTCGACGGAGCAGATCACCGGGTTCAGCGAGCGGTAATCTGGCC 

CCTCGTGTACCTCAAGAGTCTGACGTTGAAGGGCTTCAAGTCCTTCGCCGCGCCGACGACTTTA 

CGCTTCGAGCCGGGCATTACGGCCGTCGTTGGGCCCAACGGCTCCGGCAAATCCAATGTGGTC 

GATGCCCTGGCGTGGGTGATGGGGGAGCAGGGGGCAAAGACGCTGCGCGGCGGCAAGATGG 

AAGACGTCATCTTCGCCGGCACCTCGTCGCGTGCGCCGCTGGGCCGCGCCGAAGTCACCGTTA 

GCATCGACAACTCCGACAACGCACTGCCTATCGAATACACCGAGGTGTCGATCACCCGAAGAAT 

GTTTCGCGACGGTGCCAGCGAATACGAAATCAACGGCAGCAGTTGCCGTTTGATGGATGTGCA 

GGAGTTGCTGAGCGACTCCGGCATCGGCCGTGAGATGCATGTGATTGTTGGGCAAGGGAAGCT 

CGAGGAGATCTTGCAGTCGCGGCCTGAGGATCGGCGGGCGTTCATCGAGGAAGCCGCCGGTG 

TGCTCAAGCATCGCAAGCGCAAGGAAAAAGCTCTGCGCAAACTCGACACGATGGCGGCGAACC 

TGGCCCGGCTCACCGATCTGACCACCGAGCTCCGGCGTCAACTCAAACCGCTGGGCCGGCAG 

GCCGAGGCGGCCCAGCGTGCCGCGGCCATCCAAGCCGATCTGCGCGACGCCGGGCTGCGCCT 

GGCGGCCGACGACTTGGTAAGCCGCAGAGCCGAACGGGAAGCGGTCTTTCAGGCCGAGGCTG 

CGATGCGCCGCGAGCATGACGAGGCCGCCGCCCGGCTGGCGGTGGCATCCGAGGAGCTGGC 

CGCGCATGAGTCCGCGGTCGCCGAACTCTCGACGCGGGCCGAGTCGATCCAGCACACTTGGTT 

CGGGCTGTCTGCGCTGGCCGAACGGGTGGACGCTACGGTGCGCATCGCCAGCGAACGCGCCC 

ATCATCTCGATATCGAGCCGGTAGCGGTCAGCGACACCGACCCCAGAAAGCCCGAGGAGCTAG 

AAGCCGAGGCCCAGCAGGTGGCCGTCGCCGAGCAACAACTGTTAGCGGAGCTGGACGCGGCG 
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CGTGCCCGACTCGATGCTGCCCGTGCAGAGCTGGCCGACCGGGAGCGCCGCGCCGCCGAGG 

CCGACCGGGCACACCTGGCGGCGGTCCGGGAGGAGGCGGACCGCCGTGAGGGACTGGCGCG 

GCTGGCTGGCCAGGTGGAGACCATGCGGGCGCGTGTCGAATCGATCGATGAGAGCGTGGCAC 

GGTTGTCCGAGCGGATCGAGGATGCCGCAATGCGCGCCCAGCAGACCCGAGCCGAGTTCGAA 

ACCGTGCAGGGCCGCATCGGTGAACTGGATCAAGGCGAGGTCGGCCTGGATGAGCACCACGA 

GCGTACTGTGGCCGCGTTGCGGTTGGCCGACGAACGCGTCGCCGAGCTGCAATCCGCCGAAC 

GCGCCGCCGAACGCXAGGTGGCATCGCTACGGGCTCGCATCGATGCGCTCGCAGTGGGGCTA 

CAGCGCAAGGACGGCGCGGCGTGGCTGGCXSCACAATCGCAGTGGCGCAGGGCTTTTCGGTTC 

GATCGCCCAATTGGTGAAGGTACGTTCCGGCTATGAAGCGGCACTGGCCGCGGCGCTCGGGC 

CGGCGGCCGACGCACTTGCGGTGGACGGCCTGACTGCCGCGGGTAGTGCCGTCAGCGCACTC 

AAACAAGCCGACGGCGGTCGCGCGGTCCTCGTGCTGAGTGACTGGCCGGCCCCGCAAGCCCC 

CCAATCCGCCTCGGGGGAGATGCTGCCTAGCGGCGCCCAGTGGGCCCTAGACCTGGTCGAGT 

CTCCACCGCAGTTGGTTGGCGCGATGATCGCCATGCTTTCGGGTGTCGCGGTGGTCAACGACC 

TGACTGAGGCAATGGGCCTGGTCGAGATTCGTCCGGAGCTACGCGCGGTCACCGTTGACGGTG 

ATCTGGTGGGCGCCX3GCTGGGTCAGCGGCGGATCGGACCGCAAGCTGTCCACCTTGGAGGTC 

ACCTCCGAGATCGACAAGGCCAGGAGTGAGCTGGCCGCTGCCGAGGCGCTGGCGGCGCAATT 

GAATGCGGCCCTGGCCGGTGCGCTGACCGAGCAGTCCGCCCGCCAGGACGCGGCCGAGCAA 

GCCTTGGCCGCGCTTAACGAATCCGACACGGCCATGTCGGCGATGTACGAGCAGCTGGGCCGC 

CTCGGGCAGGAGGCCCGCGCGGCGGAAGAAGAGTGGAACCGGTTGCTGCAGCAGCGTACGGA 

ACAGGAAGCCGTGCGCACACAGACTCTCGACGACGTCATACAACTTGAGACCCAGCTGCGTAA 

GGCCCAGGAGACCCAACGGGTGCAGGTGGCCCAACCGATCGACCGCCAGGCGATCAGTGCCG 

CTGCCGATCGCGCCCGCGGTGTCGAAGTGGAAGCCCGGCTGGCGGTGCGCACCGCCGAGGAA 

CGCGCCAACGCGGTTCGCGGGCGGGCCGATTCGCTGCGCCGTGCGGCTGCGGCGGAACGTG 

AGGCGGGGGTGCGGGCTCAGCAAGCACGCGCCGCAAGACTGCATGCGGCCGCGGTGGCCGC 

AGCGGTCGCCGACTGCGGACGGCTGCTGGCCGGGCGGTTGCACCGGGCGGTGGACGGGGCG 

TCGCAACTGCGCGAGGCGTCGGCCGCGCAACGTCAGCAGCGGTTAGCGGCGATGGCCGCGGT 

GCGCGACGAGGTGAACACGCTGAGCGCCCGAGTGGGGGAACTCACCGATTCGCTGCACCGCG 

ACGAGCTGGCTAACGCGCAGGCGGCGCTGCGTATCGAGCAGCTTGAGCAGATGGTGCTAGAG 

CAGTTCGGAATGGCGCCGGCCGACTTGATCACCGAATACGGTCCACATGTGGCGCTACCACCG 

ACCGAGCTCGAGATGGCTGAGTTCGAGCAAGCCCGCGAACGCGGCGAGCAGGTGATTGCGCC 

CGCCCCCATGCCGTTCGACCGGGTTACCCAGGAGCGCCGGGCCAAACGCGCCGAGCGTGCGC 

TTGCCGAGTTGGGCAGGGTCAACCCGCTGGCGCTCGAAGAGTTTGCTGCCTTGGAGGAGCGCT 

ACAATTTCCTGTCCACCCAACTCGAGGATGTCAAGGCTGCCCGCAAGGATCTGCTGGGCGTCGT 

CGCCGATGTTGACGGCCGCATCCTGCAGGTGTTCAATGACGCGTTCGTAGACGTGGAACGCGA 

ATTTCGCGGCGTGTTCACCGCATTGTTCCCCGGTGGTGAAGGACGGCTGCGGCTGACCGAGCC 

CGACGACATGCTCACCACCGGCATCGAGGTCGAAGCCCGCCCGCCGGGCAAGAAGATTACCC 

GACTGTCTTTGCTCTCCGGTGGCGAGAAGGCGCTGACCGCGGTGGCGATGCTGGTCGCGATCT 
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TTCGTGCCCGTCCATCGCCGTTCTACATCATGGACGAGGTGGAGGCCGCCCTCGACGACGTGA 
ACCTGCGCCGACTGCTCAGCCTGTTCGAACAGCTGCGAGAGCAGTCGCAGATCATCATCATCAC 
CCACCAGAAGCCGACGATGGAGGTCGCGGACGCACTGTACGGCGTAACCATGCAGAACGACG 
GCATCACCGCGGTCATCTCGCAGCGCATGCX3CGGTCAGCAGGTGGATCAGCTGGTTACCAATT 

CCTCGTAG 

>Rv2925c mc RNAse III TB.seq 3239829:3240548 MW:25400 
>emb|AL123456|MtBH37RV:c3240548-3239826. mc SEQ ID NO:114 

ATGATCCGGTCACGACAACCCCTGCTCGACGCACTCGGTGTGGACCTCCCX3GACGAGCTGCTC 

TCACTGGCGTTGACCCACCGCAGCTACGCCTACGAGAACGGCGGGCTGCCGACCAACGAGCGT 

TTGGAGTTTCTCGGCGATGCCGTGCTAGGGCTGACCATCACCGACGCGCTGTTCCATCGTCATC 

CTGATCGGTCGGAGGGGGATCTGGCCAAACTGCGGGCCAGCGTAGTCAACACCCAGGCCCTG 

GCCGACGTCGCACGCCGCCTCTGTGCGGAAGGCCTCGGTGTTCACGTGCTATTGGGTCGCGGC 

GAGGCGAACACCGGCGGGGCCGACAAGTCCAGCATTCTGGCCGACGGTATGGAATCGCTGCT 

GGGCGCGATCTACCTGCAACACGGTATGGAGAAGGCCCGTGAGGTGATCCTGCGGCTGTTTGG 

CCCGTTGCTGGACGCCGCGCCGACCCTGGGTGCGGGATTGGATTGGAAGACCAGCTTGCAGG 

AGCTGACTGCAGCGCGAGGGCTGGGTGCGCCGTCATACCTGGTCACCTCCACCGGCCCGGAC 

CACGATAAGGAATTCACCGCGGTGGTTGTCGTGATGGACAGCGAATACGGTTCAGGAGTGGGC 

CGGTCCAAAAAAGAAGCCGAGCAAAAAGCCGCGGCGGCCGCTTGGAAAGCCCTGGAAGTGCTC 

GACAACGCCATGCCGGGCAAAACCTCCGCCTAA 

>Rv2934 ppsD TB.seq 3262245:3267725 IVIW:193317 
>emb|AL123456|MTBH37RV:3262245-3267728. ppsD SEQ ID NO:115 

ATGACAAGTCTGGCGGAGCGCGCGGCGCAACTGTCGCCGAACGCGCGAGCGGCCCTGGCGCG 

CGAGCTCGTCCGTGCGGGTACGACCTTCCCGACCGACATCTGCGAGCCGGTGGCGGTGGTGG 

GCATCGGCTGTCGCTTTCCGGGGAATGTGACTGGGCCAGAGAGCTTTTGGCAGCTACTGGCCG 

ACGGTGTGGACACAATCGAGCAGGTGCCGCCTGATCGGTGGGATGCGGACGCGTTCTACGATC 

CCGATCCTTCGGCGTCGGGTCGGATGACGACGAAATGGGGTGGTTTCGTTTCCGATGTCGACG 

CGTTCGACGCCGACTTTTTCGGAATCACTCCTCGGGAAGCCGTGGCGATGGACCCGCAGCATC 

GGATGCTGCTCGAGGTTGCCTGGGAAGCGTTGGAGCACGCGGGTATTCCGCCGGATTCCTTGA 

GCGGCACTCGAACCGGCGTGATGATGGGTCTGTCGTCGTGGGACTACACGATCGTCAATATCG 

AGCGCAGAGCCGACATCGACGCGTACCTGAGCACCGGAACCCCGCACTGTGCCGCGGTGGGG 

CGGATCGCGTATCTGTTGGGATTGCGTGGTCCGGCCGTCGCCGTAGATACCGCTTGTTCGTCGT 

CGCTGGTGGCAATTCACTTGGCGTGTCAGAGCCTTCGCCTGCGTGAAACCGACGTGGCATTGG 

CGGGCGGGGTGCAGCTCACCTTGTCACCGTTCACCGCCATCGCGCTGTCCAAGTGGTCGGCGC 

TGTCACCGACCGGCCGATGCAACAGCTTCGACGCCAACGCGGATGGATTCGTGCGCGGCGAG 

GGCTGCGGCGTGGTGGTGCTCAAGCGGTTGGCCGACGCGGTGCGCGACCAGGACCGGGTGCT 
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TGCGGTGGTCCGCGGTTCGGCAACTAACTCCGATGGTCGGTCCAACGGCATGACCGCACCGAA 
CGCGCTGGCGCAGCGTGACGTGATCACATCCGCCCTCAAGCTTGCGGATGTtACCCCTGACAG 
CGTGAACTATGTCGAAACACACGGCACCGGAACGGTGTTGGGGGACCCCATCGAGTTCGAGTC 
GCTGGCGGCCACTTATGGCCTGGGTAAAGGCCAGGGCGAGAGCCCGTGCGCATTGGGGTCGG 
5 TCAAGACCAACATCGGCCACCTGGAGGCGGCCGCCGGTGTGGCTGGATTCATCAAGGCGGTGC 
TGGCGGTGCAACGTGGGCACATTCCCCGCAACTTGCACTTCACCCGGTGGAACCCGGCCATCG 
ACGCGTCGGCGACGCGGCTGTTCGTGCCGACCGAAAGCGCCCCGTGGCCGGCGGCTGCCGGT 
CCACGCAGGGCTGCGGTGTCATCGTTCGGCCTCAGCGGGACCAACGCGCACGTGGTGGTCGA 
GCAGGCACCCGACACCGCAGTAGCCGCAGCCGGCGGCATGCCGTATGTTTCGGCGCTGAACG 

10 TCTCCGGCAAGACGGCCGCGCGGGTGGCGTCGGCGGCGGCGGTGCTGGCCGACTGGATGTC 
GGGGCCGGGCGCGGCGGCACCACTGGCCGACGTGGCACACACGTTGAACCGQCACCGGGCC 
CGGCACGCCAAGTTCGCCACCGTCATCGCGCGTGACCGCGCCGAGGCGATCGCGGGGTTGCG 
AGCGCTGGCGGCCGGACAACCACGCGTTGGGGTGGTGGATTGCGACCAGCATGCCGGTGGGC 
CTGGCCGGGTTTTTGTGTATTCGGGTCAGGGCTCGCAGTGGGCGTCGATGGGCCAGCAGTTGC 

15 TGGCCAACGAACCGGCGTTCGCCAAGGCGGTAGCCGAGCTGGATCCGATATTCGTTGACCAGG 
TTGGCTTTTCGCTGCAGCAAACGCTTATCGACGGCGACGAGGTGGTGGGCATCGACCGCATCC 
AGCCGGTGCTGGTCGGGATGCAGTTGGCGCTGACCGAGTTATGGCGGTCCTATGGGGTGATTC 
CAGATGCCGTGATCGGGCACTCGATGGGTGAGGTGTGGGCGGCAGTGGTGGCCGGCGCGTTG 
ACGCCCGAGCAGGGCTTGCGGGTCATCACCACCCGGTCGCGGTTGATGGCGCGGCTGTCGGG 

20 GCAGGGAGCGATGGCGCTGCTCGAGCTGGATGCCGACGCCGCCGAGGCGCTGATTGCCGGCT 
ATCCGCAGGTGACGCTGGCGGTGCATGCGTCACCGCGCCAGACGGTGATCGCCGGGCCGCCC 
GAGCAGGTGGACACGGTGATCGCGGCGGTAGCGACGCAAAACCGGTTGGCGCGCCGCGTCGA 
AGTCGACGTGGCCTCCCATCACCCGATCATCGATCCCATACTGCCCGAGTTGCGAAGCGCGTTA 
GCGGATTTGACTCCGCAGCCGCCGAGCATCCCGATCATTTCCACTACGTACGAAAGCGCGCAG 

25 CCGGTGGCGGATGGCGACTATTGGTCGGCCAACCTGCGCAACCCGGTGCGATTCCACCAGGCC 
GfCACCGCCGCCGGTGTCGACCACAACACCTTCATCGAAATCAGCCCTCACGCCGTGCTCACG 
CACGCACTCACCGACACCCTGGATCCGGACGGCAGCCATACAGTCATGTCGACGATGAACCGC 
GAACTGGACCAGACGCTGTATTTCCACGCGCAACTCGCCGCGGTCGGTGTGGCTGCGTCCGAG 
CACACCACCGGTCGCCTTGTCGACCTGCCCCCCACACCGTGGCACCATCAGCGATTCTGGGTC 

30 ACGGATCGTTCGGCGATGTCCGAGCTGGCCGCGACCCACCCGGTCCTGGGCGCGCACATCGA 
GATGCCGCGCAACGGAGACCATGTCTGGCAGACCGATGTCGGCACCGAGGTCTGTCCCTGGTT 
GGCAGACCACAAGGTGTTCGGTCAACCCATCATGCCGGCCGCGGGGTTCGCCGAGATCGCCTT 
GGCGGCGGCCAGCGAAGCCCTCGGCACAGCCGCCGACGCCGTCGCACCCAACATCGTGATCA 
ACCAGTTCGAGGTGGAGCAGATGCTGCCCCTCGACGGCCACACGCCGCTAACGACGCAGTTAA 

35 TTCGCGGCGGGGACAGCCAGATTCGGGTCGAGATCTATTCCCGCACGCGTGGCGGAGAGTTCT 
GCCGACACGCCACGGCCAAGGTTGAACAATCGCCGCGCGAATGTGCGCACGCGCACGCGGAA 
GCCCAAGGTCCCGCCACCGGGACAACAGTGTCGCCGGCCGATTTTTATGCCCTGCTCCGCCAA 
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ACCGGCCAACACCATGGTCCGGCGTTCGCGGCCTTAAGCCGGATCGTGCGCCTGGCCGATGGT 
TCCGCGGAAACCGAGATCAGCATTCCCGACGAGGCGCCGCGCCATCCCGGGTATCGGCTGCA 
CCCCGTGGTATTGGATGCGGCATTGCAAAGCGTGGGTGCCGCGATACCCGACGGCGAGATCGC 
GGGGTCGGCGGAAGCCAGGTATCTGCCAGTGTCGTTCGAGACCATCCGGGTGTACCGCGACAT 

5 CGGTCGGCACGTCAGGTGTCGTGCCCACCTGACAAACCTCGACGGCGGCACCGGAAAGATGG 
GCAGGATCGTCCTAATCAACGACGCCGGCCACATAGCGGCCGAAGTGGACGGCATCTATCTGC 
GTCGTGTCGAACGCCGTGCGGTACCCCTGCCACTAGAGCAGAAGATCTTCGATGCCGAATGGA 
CCGAAAGCCCGATCGCAGCCGTGCCGGCTCCGGAGCCAGCTGCCGAGACGACGCGGGGAAGT 
TGGCTGGTACTCGCCGATGCAACGGTGGATGGGCCAGGCAAGGCCCAGGCCAAGTCGATGGC 

10 CGACGACTTCGTGCAGCAGTGGCGCTCACCGATGCGGCGGGTGCACACCGCCGATATCCACGA 
CGAATCGGCGGTGCTGGCCGCATTTGCAGAAACGGCAGGCGATCCCGAGCACCCGCCGGTTG 
GCGTGGTGGTGTTCGTCGGCGGTGCCTCGAGTCGACTGGACGACGAGCTGGCGGCGGCGCGC 
GACACGGTGTGGTCGATCACCACGGTGGTTCGTGCGGTCGTCGGCAGGTGGCACGGCCGATCA 
CCGCGGCTATGGCTGGTCACCGGGGGCGGACTTTCCGTTGCCGACGACGAGCCGGGAACACC 

15 CGGGGCGGCTTCCTTGAAAGGGCTGGTGCGGGTGCTCGCCTTCGAGCACCGGGACATGCGCA 
CCACCCTGGTCGATCTGGACATCACACAAGACCCGCTGACCGCGCTGAGCGCGGAACTGCGGA 
ATGCCGGGAGTGGGTCGCGCCATGATGACGTGATCGCGTGGCGCGGCGAGCGCAGGTTCGTC 
GAACGGCTGTCGCGCGCCACGATCGATGTATCCAAAGGGCATCCGGTGGTGCGCCAGGGAGC 
GTCGTACGTCGTCACCGGCGGCCTCGGCGGTCTCGGCCTGGTCGTCGCTCGTTGGCTGGTGG 

20 ACCGCGGCGCCGGCCGGGTGGTGCTGGGTGGCCGCAGCGATCCCACTGACGAGCAGTGCAAC 
GTCCTGGCCGAACTGCAGACCGGCGCCGAGATCGTGGTTGTCCGTGGCGACGTGGCATCGCC 
GGGGGTGGCAGAAAAGCTGATTGAGACGGCCCGACAGTCTGGGGGCCAATTGCGCGGCGTCG 
TGCACGCCGCCGCGGTCATCGAAGACAGCCTGGTGTTCTCTATGAGCAGGGACAACCTAGAAC 
GGGTGTGGGCACCCAAGGCCACCGGTGCGCTGCGCATGCACGAAGCCACCGCTGACTGCGAG 

25 CTCGACTGGTGGCTCGGATTCTCTTCCGCCGCTTCGCTATTGGGTTCTCCCGGGCAAGCGGCCT 
ACGCGTGCGCCAGCGCGTGGCTGGACGCGCTGGTCGGATGGCGCAGGGCATCCGGCCTGCC 
GGCCGCGGTGATCAACTGGGGTCCGTGGTCGGAGGTAGGCGTCGCCCAGGCCTTGGTGGGCA 
GTGTTCTCGACACGATCAGTGTCGCAGAAGGCATCGAGGCTCTCGACTCATTGCTTGCCGCCGA 
CCGGATCCGCACTGGAGTGGCTCGGCTGCGTGCCGATCGGGCCCTGGTCGCATTCCCGGAGA 

30 TCCGCAGCATCAGCTACTTCACGCAGGTGGTCGAGGAGCTGGACTCGGCGGGTGACCTCGGCG 
ACTGGGGCGGGCCCGACGCGCTTGCCGACCTCGACCCGGGCGAGGCGCGGCGCGCGGTGAC 
CGAGCGGATGTGTGCGCGCATCGCTGCGGTGATGGGCTACACTGACCAGTCGACTGTCGAACC 
CGCCGTGCCCTTGGACAAGCCCCTGACCGAGCTGGGGCTGGATTCTCTGATGGCGGTACGAAT 
ACGCAACGGCGCGCGGGCGGATTTCGGCGTGGAACCGCCGGTAGCGCTGATACTGCAAGGCG 

35 CGTCCTTGCATGACCTGACGGCGGACTTAATGCGCCAACTCGGGCTCAATGATCCCGATCCGG 
CGCTCAACAACGCTGACACTATTCGCGAGCGGGCGCGCCAGCGCGCGGCAGCGCGACACGGA 
GCCGCGATGCGGCGCCGACCTAAACCTGAAGTACAGGGAGGATAA 
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>Rv2946c picsl TB.seq 3291503:3296350 MW:1 66642 
>emb|AL123456|MTBH37RV:c3296350-3291500. pks1 SEQ ID NO: 11 6 

GTGATTTCGGCGAGATCGGCTGAGGCGTTGACGGCGCAGGCGGGTCGACTTATGGCCCACGTG 
5 CAGGCCAACCCAGGGCTGGATCCGATCGATGTGGGGTGCTCGTTGGCCAGTCGCTCGGTGTTT 
GAGCACCGAGCGGTGGTGGTCGGCGCAAGCCGTGAGCAACTGATTGCCGGGCTGGCTGGGCT 
CGCGGCGGGCGAGCCGGGTGCCGGCGTGGCGGTCGGTCAGCCAGGGTCGGTGGGCAAGACG 
GTGGTCGTGTTTCCTGGGCAGGGCGCGCAGCGCATCGGGATGGGCCGCGAGTTGTACGGCGA 
GTTGCCCGTGTTTGCGCAGGCATTCGATGCGGTGGCCGACGAGTTGGACCGGCATCTGCGGTT 

1 0 GCCGCTGCGCGACGTTATTTGGGGTGCCGATGCGGATTTGCTTGACAGCACCGAATTTGCTCAG 
CCCGCGTTGTTCGCGGTGGAGGTGGCATCGTTCGCGGTGTTGCGGGATTGGGGTGTGCTTCCG 
GACTTCGTCATGGGTCACTCCGTTGGAGAGCTGGCGGCGGCGCACGCGGCCGGTGTGTTGAC 
GTTGGCGGACGCGGCGATGCTGGTGGTGGCGCGGGGCCGGTTGATGCAGGCGCTGCCGGCA 
GGCGGTGCGATGGTGGCGGTGGCTGCCAGTGAGGACGAGGTGGAGCCGCTGCTGGGTGAGG 

1 5 GTGTGGGGATCGCTGCGATCAACGCGCCCG AATCGGTGGTGATCTCCGGTGCGCAGGCCGCG 
GCAAATGCGATTGCGGATCGGTTCGCCGCGCAGGGTCGGCGGGTGCACCAGTTGGCGGTCTC 
GCATGCGTTTCATTCGCCGTTGATGGAGCCGATGCTCGAGGAGTTCGCGCGTGTCGCGGCCXJG 
GGTGCAGGCACGCGAGCCCCAGCTTGGGCTGGTGTCGAACGTGACGGGCGAGTTGGCCGGCC 
CTGATTTCGGGTCGGCGCAGTACTGGGTGGACCACGTTCGTCGGCCGGTGCGCTTCGCGGACA 

20 GTGCGCGTCATTTGCAGACCCTTGGGGCGACCCACTTCATCGAGGCCGGCCCGGGAAGTGGTT 
TGACTGGCTCGATCGAGCAGTCCTTGGCCCCGGCTGAGGCGATGGTGGTGTCGATGCTGGGCA 
AAGACCGGCCCGAGCTGGCCTCGGCGCTCGGTGCTGCCGGTCAGGTGTTCACCACCGGTGTG 
CCGGTGCAGTGGTCGGCGGTGTTCGCCGGCTCGGGTGGACGGCGGGTGCAGCTGCCCACGTA 
TGCGTTTCAGCGACGGCGGTTTTGGGAGACGCGGGGCGCGGATGGGCCCGCCGATGCGGCCG 

25 GGTTGGGTCTGGGCGCGACCGAGCATGCCTTGTTGGGTGCGGTGGTCGAGCGGCCCGATTCT 
GACGAGGTGGTGCTGACCGGCCGGTTGTCGCTTGCGGATCAGCCGTGGCTGGCCGACCACGT 
GGTGAACGGGGTGGTGCTGTTCCCCGGGGCGGGTTTTGTGGAGTTGGTGATCCGCGCCGGTG 
ATGAGGTCGGGTGCGCGCTCATCGAAGAGTTGGTGCTGGCCGCACCGTTGGTGATGCACCCGG 
GTGTCGGGGTTCAGGTGCAGGTGGTCGTCGGGGCTGCCGATGAATCCGGGCACCGTGCGGTG 

30 TCGGTGTATTCCCGCGGTGATCAATCCCAGGGTTGGTTGCTGAACGCCGAAGGCATGCTGGGG 
GTGGCTGCCGCTGAGACGCCGATGGATTTGTCCGTGTGGCCGCCCGAGGGCGCGGAGAGTGT 
GGATATCTCGGACGGCTATGCGCAGTTGGCCGAGCGCGGTTATGCCTACGGCCCCGCGTTTCA 
GGGTCTGGTGGCGATCTGGCGGCGGGGGTCGGAGCTGTTCGCCGAAGTTGTAGCCCCCGGCG 
AGGCCGGCGTGGCCGTCGACXJGAATGGGGATGCATCCGGCGGTGTTGGACGCGGTGCTGCAT 

35 GCCCTCGGGCTGGCCGTCGAGAAGACCCAGGCGAGCACCGAGACGAGACTGCCGTTTTGCTG 
GCGTGGGGTGTCGCTGCATGCCGGCGGCGCTGGACGGGTGCGGGCCCGCTTCGCGTCCGCG 
GGCGCGGATGCGATTTCCGTGGACGTCTGCGACGCCACTGGGCTGCCGGTGTTGACGGTGCG 
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CTCGCTGGTTACTCGCCCGATAACCGCAGAACAGCTGCGCGCCGCCGTGACCGCGGCCGGCG 

GTGCGTCCGATCAGGGGCCGCTGGAAGTGGTGTGGTCGCGGATCTCGGTGGTCAGCGGCGGC 

GCTAACGGGTCCGCCCCACCTGCCCCGGTGTCTTGGGCGGACTTTTGCGCCGGCAGTGATGGT 

GACGCCAGTGTCGTGGTGTGGGAACTCGAGTCTGCCGGTGGCCAAGCATCCTCGGTGGTGGG 

CTCGGTGTATGCGGCCACCCACACCGCCCTGGAGGTGTTGCAGTCCTGGCTCGGCGCGGATCG 

GGCGGCCAGGTTGGTGGTGTTGACCCATGGTGGCGTGGGGCTGGCTGGCGAGGACATCAGCG 

ACCTGGCCGCCGCCGCGGTGTGGGGCATGGCGCGTTCCGCGCAGGCCGAAAATCCCGGCCG 

GATCGTGTTGATCGACACCGATGCGGCGGTGGATGCCTCGGTGCTAGCCGGCGTCGGGGAAC 

CCCAGCTGCTGGTGCGCGGCGGCACTGTGCACGCCCCCCGGCTGTCCCCGGCCCCGGCGTTG 

CTAGCGTTACCGGCGGCAGAGTCGGCGTGGCGATTGGCCGCXJGGTGGTGGCGGGACCGTGGA 

GGATTTGGTGATCCAGCCCTGCCCGGAGGTACAGGCACCGCTACAGGCGGGGCAGGTGCGCG 

TGGCGGTGGCGGCCGTCGGGGTCAACTTCCGCGATGTGGTGGCCGCCCTAGGGATGTATCCC 

GGCCAGGCCCCACCGCTGGGTGCCGAAGGCGCCGGGGTGGTGCTTGAGACCGGTCCCGAAGT 

GACCGATCTTGCCGTCGGTGACGCCGTGATGGGATTCCTGGGCGGGGCCGGTCCGCTGGCGG 

TGGTGGATCAGCAACTGGTTACCCGGGTGCCGCAAGGCTGGTCGTTTGCTCAGGCAGCCGCTG 

TGCCGGTGGTGTTCTTGACGGCCTGGTACGGGTTGGCCGATTTAGCCGAGATCAAGGCGGGCG 

AATCGGTGCTGATCCATGCCGGTACCGGCGGTGTGGGCATGGCGGCTGTGCAGCTGGCTCGC 

CAGTGGGGCGTGGAGGTTTTCGTCACCGCCAGCCGTGGCAAGTGGGACACGCTGCGCGCCAT 

GGGGTTTGACGACGACCATATCGGCGATTCCCGCACATGCGAGTTCGAGGAGAAGTTCCTGGC 

GGTCACCGAGGGCCGCGGGGTTGATGTGGTGCTCGACTCGCTGGCCGGTGAGTTCGTGGATG 

CGTCGCTGCGCTTACTGGTCCGCGGTGGGCGTTTCCTCGAGATGGGCAAGACGGATATCCGCG 

ATGCGCAGGAGATCGCCGCTAATTATCCCGGCGTGCAGTATCGGGCGTTCGACCTGTCGGAGG 

CCGGCCCGGCACGCATGCAGGAGATGTTGGCCGAGGTGCGGGAGCTGTTCGACACCCGGGAG 

CTGCACCGGCTACCGGTCACCACGTGGGATGTGCGCTGCGCCCCGGCGGCCTTCCGGTTCATG 

AGCCAGGCCCGCCATATCGGCAAGGTTGTCTTAACCATGCCCTCGGCGTTGGCCGACCGGCTT 

GCCGACGGCACGGTGGTGATCACCGGTGCCACCGGGGCGGTTGGTGGGGTGTTGGCCCGCCA 

CCTGGTTGGCGCCTATGGGGTGCGTCATCTGGTGTTGGCCAGTCGGCGGGGCGATCGCGCGG 

AGGGAGCGGCCGAATTGGCCGCCGACTTGACGGAGGCCGGCGGCAAGGTGCAGGTGGTGGC 

CTGTGAGGTGGCCGATCGCGCTGCGGTAGCGGGGTTGTTTGCCCAGCTGTCGCGGGAGTACCC 

GCCGGTGCGCGGGGTGATTCATGCCGCCGGCGTGCTCGATGACGCAGTGATCACCTCGTTGAC 

ACCGGACCGCATCGATACGGTGTTGCGGGCCAAGGTGGACGCGGCGTGGAACCTGCACCAGG 

CCACCAGTGACCTGGATTTGTCGATGTTTGCGCTGTGCTCATCGATCGCGGCCACGGTCGGCTC 

GCCGGGGCAGGGCAACTACTCGGCGGCAAACGCGTTTCTGGACGGGTTGGCCGCTCACCGGC 

AGGCCGCAGGGTTGGCCGGGATATCACTGGCGTGGGGTTTGTGGGAACAGCCTGGCGGCATG 

ACCGCGCATTTGAGCAGCCGAGATCTGGCCCGCATGAGCCGCAGCGGGCTGGCTCCGATGAG 

CCCTGCCGAAGCGGTGGAATTGTTTGACGCTGCGCTGGCCATCGATCACCCTCTGGCGGTGGC 

CACGCTCTTGGACCGGGCTGCACTAGACGCCCGGGCCCAGGCCGGTGCGTTGCCGGCGCTGT 
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TCAGCGGGCTCGCGCGCCGCCCACGCCGACGCCAAATCGACGACACCGGTGACGCCACCTCG 

TCGAAGTCGGCGCTGGCTCAACGCCTACACGGGCTGGCCGCGGACGAACAACTCGAGCTGCTA 

GTGGGGCTGGTGTGTCTGCAGGCAGCGGCAGTGCTGGGTAGGCCCTCCGCCGAGGACGTCGA 

CCCCGACACCGAATTCGGCGACCTCGGTTTCGACTCATTAACGGCTGTGGAGTTACGCAACCGC 

CTCAAAACCGCCACCX3GACTGACGCTGCCA(X;TACCGTGATTrTCGATCATCCCACTCCCACTG 

CGGTCGCCGAGTATGTCGCCCAGCAAATGTCTGGCAGCCGCCCAACGGAATCCGGTGATCCGA 

CGTCGCAGGTTGTCGAACCCGCCGCCGCGGAAGTATCGGTCCATGCCTAG 

>Rv3014c ligA DNA tigase TB.seq 3372545:3374617 MW:75258 
>emb|AL123456|MTBH37RV:c337461 7-3372542, ligA SEQ ID NO:117 

GTGAGCTCCCCAGACGCCGATCAGACCGCTCCCGAGGTGTTGCGGCAGTGGCAGGCACTGGC 

CGAGGAGGTGCGTGAGCACCAGTTCCGTTATTACGTGCGGGACGCGCCGATCATCAGCGACGC 

GGAATTCGACGAGCTGCTGCGCCGTCTGGAAGCCCTCGAGGAGCAGCATCCCGAGCTGCGCA 

CGCCCGATTCGCCGACCCAGCTGGTCGGCGGTGCCGGCTTCGCCACGGATTTCGAGCCCGTC 

GACCATCTCGAACGAATGCTCAGCCTCGACAACGCGTTCACCGCCGACGAACTCGCCGCCTGG 

GCCGGCCGCATCCATGCCGAGGTCGGAGACGCCGCACATTACCTGTGTGAGCTCAAGATCGAC 

GGCGTCGCGCTGTCTTTGGTCTACCGCGAGGGACGGCTGACCCGGGCCTCCACCCGCGGCGA 

CGGGCGCACCGGCGAGGACGTCACCCTGAACGCCCGGACCATCGCCGACGTTCCCX3AACGGC 

TCACCCCCGGCGACGACTACCCGGTGCCCGAGGTCCTCGAGGTCCGCGGCGAGGTCTTCTTCC 

GGCTGGACGACTTCCAGGCGCTCAACGCCAGCCTCGTCGAGGAGGGCAAGGCGCCGTTCGCC 

AACCCCCGCAACAGCGCGGCGGGATCGCTGCGCCAGAAAGACCCGGCGGTCACCGCGCGCCG 

CCGGCTGCGGATGATCTGCCACGGGCTGGGCCACGTGGAGGGCTTTCGCCCGGCCACCCTGC 

ATCAGGCATACCTGGCGTTGCGGGCATGGGGACTGCCGGTTTCCGAACACACCACCCTGGCAA 

CCGACCTGGCCGGTGTGCGCGAGCGCATCGACTACTGGGGCGAGCACCGCCACGAGGTGGAC 

CACGAAATCGACGGCGTGGTGGTCAAAGTCGACGAGGTGGCGTTGCAGCGCAGGCTGGGTTC 

CACGTCGCGGGCGCCGCGCTGGGCCATCGCCTACAAGTACCCGCCCGAGGAAGCGCAGACCA 

AGCTGCTCGACATCCGGGTGAACGTCGGCCGCACCGGGCGGATCACGCCGTTTGCGTTCATGA 

CGCCGGTGAAGGTGGCCGGGTCGACGGTGGGACAGGCCACCCTGCACAACGCCTCGGAGATC 

AAGCGCAAGGGCGTGCTGATCGGCGACACCGTGGTGATCCGCAAGGCCGGCGACGTGATCCC 

CGAGGTGCTGGGACCCGTCGTCGAACTGCGCGATGGCTCCGAACGCGAATTCATCATGCCCAC 

CACCTGCCCGGAGTGCGGTTCGCCGTTGGCGCCGGAGAAGGAAGGCGACGCCGACATCCGTT 

GCCCCAACGCCCGCGGCTGCCCGGGGCAACTGCGGGAGCGGGTTTTCCACGTCGCCAGCCGC 

AACGGCCTAGACATCGAGGTGCTCGGTTACGAGGCGGGTGTGGCGCTCTTGCAGGCGAAGGT 

GATCGCCX3ACGAGGGCGAGCTGTTCGCGCTGACCGAGCGGGACTTGCTGCGCACCGACCTGT 

TCCGAACCAAGGCAGGCGAACTGTCGGCCAACGGCAAACGGCTGCTGGTCAACCTCGACAAGG 

CCAAGGCGGCACCGCTGTGGCGGGTGCTGGTGGCGCTGTCCATCCGCCATGTCGGGCCGACG 

GCGGCCCGCGCCCTGGCCACCGAGTTCGGCAGCCTTGACGCCATCGCCGCGGCGTCCACCGA 
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CCAGCTGGCCGCCGTCGAGGGGGTGGGGCCGACCATTGCCGCCGCGGTCACCGAGTGGTTCG 

CCGTCGACTGGCACCGCGAGATCGTCGACAAGTGGCGGGCCGCCGGGGTGCGAATGGTCGAC 

GAGCGTGACGAGAGTGTGCCACGCACGCTGGCCGGGCTGACCATCGTGGTCACCGGCTCGCT 

GACCGGTTTCTCCCGCGACGACGCCAAGGAGGCGATCGTGGCCCGCGGCGGCAAGGCCGCCG 

GCTCGGTGTCGAAGAAGACCAACTATGTCGTCGCCGGAGACTGGCCGGGATCCAAATACGACA 

AGGCGGTGGAGTTGGGGGTGCCGATTCTGGACGAGGATGGGTTCCGGAGACTGCTGGCCGAC 

GGACCCGCGTCACGAACGTAA 

>Rv3025c - NIfS-Hke protein TB.seq 3383885:3385063 l\/IW:40948 
>emb|AL123456|MTBH37RV:c3385063-3383882. Rv3025c SEQ ID NO:118 

ATGGCCTACCTGGATCACGCTGCCACCACCCCGATGCACCCCGCCGCCATCGAGGCGATGGCG 

GCCGTGCAGCGCACCATCGGCAATGCGTCGTCGCTGCACACCAGCGGGCGCTCGGCGCGCCG 

GCGGATCGAGGAGGCCCGTGAGCTGATCGCGGACAAGCTAGGCGCTCGTCCGTCCGAGGTGA 

TCTTCACCGCGGGCGGCACCGAAAGCGACAACCTGGCTGTCAAAGGTATCTATTGGGCACGCC 

GCGATGCGGAGCCGCACCGCCGTCGCATCGTCACXJACCGAGGTGGAACACCACGCCGTACTG 

GACTCGGTGAACTGGCTCGTGGAACACGAAGGCGCCCATGTGACCTGGCTGCCGACCGCCGC 

CGACGGCTCGGTGTCGGCAACTGCGCTGCGCGAGGCACTGCAGAGCCACGACGACGTCGCGC 

TGGTATCGGTGATGTGGGCCAACAACGAGGTCGGAACTATTCTACCGATCGCCGAAATGTCAGT 

TGTCGCCATGGAATTCGGCGTGCCGATGCACAGTGATGCCATTCAGGCGGTGGGACAGCTCCC 

GCTTGACTTCGGGGCCAGCGGGCTGTCGGCGATGAGCGTGGCCGGGCACAAATTCGGTGGCC 

CGCCAGGAGTGGGTGCGTTGCTGCTGCGCCGCGACGTCACCTGCGTGCCCCTTATGGACGGC 

GGTGGGCAGGAGCGCGATATTCGTTCCGGCACACCCGATGTCGCCAGTGCAGTTGGAATGGCG 

ACGGCCGCGCAGATCGCGGTGGACGGACTCGAGGAAAACAGCGCGCGGTTACGGCTGCTGCG 

GGATCGTCTGGTCGAGGGTGTGCTGGCTGAGATTGACGATGTTTGCCTTAACGGCGCCGATGA 

CCCGATGCGGCTAGCGGGTAACGCGCACTTCACTTTCCGTGGCTGCGAAGGCGATGCGCTGTT 

GATGTTGTTGGACGCTAACGGAATCGAGTGCTCAACCGGATCGGCCTGCACGGCAGGTGTAGC 

GCAGCCCTCGCATGTGTTGATTGCAATGGGCGTCGACGCGGCCAGCGCCCGCGGATCATTGCG 

TCTCTCGCTGGGGCACACCAGTGTTGAGGCTGATGTCGATGCCGCGTTGGAGGTGCTTCCCGG 

GGCGGTGGCACGTGCACGGCGGGCCGCCCTAGCCGCCGCGGGAGCATCCCGATGA 

>Rv3080c pknK serine-threonine protein kinase TB.seq 3442656:3445985 MW:1 1 9420 
>emb|AL123456|MTBH37RV:c3445985-3442653. pknK SEQ ID NO:1 19 

ATGACCGACGTTGATCCGCACGCGACGCGGCGGGACCTGGTCCCGAATATTCCGGCGGAACTG 

CTTGAGGCTGGATTCGACAATGTCGAGGAGATCGGGCGCGGCGGATTCGGCGTCGTCTACCGC 

TGCGTCCAGCCCTCGCTGGACCGCGCCGTCGCCGTCAAGGTATTGAGCACCGACCTGGATCGG 

GACAATCTCGAGCGCTTCCTGCGCGAGCAGCGGGCCATGGGCCGCCTTTCCGGGCACCCGCA 

CATCGTGACCGTCTTGCAGGTGGGCGTGTTGGCGGGTGGGCGGCCCTTCATCGTGATGCCCTA 
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CCACGCCAAGAATTCGTTGGAGACGCTGATTCGCCGGCACGGGCCGCTGGACTGGCGCGAGA 

CGCTGTCGATCGGCGTCAAGCTCGCGGGAGCGCTGGAAGCCGCGCATCGCGTCGGCACCCTG 

CACCGTGACGTGAAGCCGGGGAATATCCTGCTGACCGACTACGGGGAACCGCAGCTGACCGAT 

TTCGGAATCGCCAGAATCGCCGGGGGTTTCGAGACGGCGACCGGGGTGATTGCCGGTTCCCCG 

GCTTTCACCGCGCCGGAAGTTCTCGAAGGAGCATCGCCGACGCCCGCCTCTGACGTGTACTCC 

CTGGGCGCGACGTTGTTCTGTGCGCTGACCGGCCATGCCGCCTACGAGCGCCGCAGCGGTGA 

GCGGGTGATCGCCCAGTTCCTGCGGATCACCTCGCAGCCGATCCCCGACCTGCGGAAGCAGG 

GACTGCCCGCGGACGTGGCCGCCGCCATCGAACGGGCGATGGCCCGCCATCCGGCGGATCGT 

CCCGCGACCGCGGCAGACGTTGGCGAGGAGCTTCGCGACGTTCAGCGCCGCAACGGCGTCAG 

CGTCGACGAGATGCCCCTCCCCGTCGAGCTGGGCGTGGAACGCCGACGCTCGCCCGAGGCGC 

ACGCGGCGCATCGGCATACCGGCGGCGGCACCCCGACGGTCCCGACGCCTCCGACACCCGCG 

ACCAAGTACCGGCCGTCGGTGCCCACCGGCTCGCTGGTCACCCGCAGCCGGCTCACCGACAT 

CCTGCGCGCCGGCGGACGGCGCCGGCTGATCCTCATCCACGCGCCCTCGGGATTCGGCAAAA 

GCACCCTGGCGGCGCAATGGCGGGAAGAGCTCTCGCGCGACGGCGCCGCGGTCGCCTGGCT 

GACAATCGACAACGACGACAACAACGAGGTGTGGTTCTTGTCGCACCTGCTCGAGTCGATCGG 

GCGGGTCCGGCCCACGCTGGCCGAGTCGTTGGGGCACGTGCTCGAAGAGCATGGGGATGACG 

CCGGCCGCTACGTGTTGACTTCGCTGATCGACGAAATCCACGAAAACGACGACCGGATCGCGG 

TGGTGATCGACGACTGGCATCGGGTGTCCGACAGCCGCACCCAAGCTGCCCTGGGTTTCCTGC 

TGGACAACGGATGTCACCACCTGCAGCTCATCGTGACCAGCTGGTCTCGCGCCGGTTTGCCGG 

TGGGCAGGTTGCGGATCGGCGACGAACTAGCCGAGATCGATTCGGCTGCTTTGCGCTTCGATA 

CCGACGAGGCCGCCGCGCTGCTGAACGATGCTGGTGGTCTGCGATTGCCGCGCGCAGACGTG 

CAGGCGCTGACTACCTCTACCGACGGGTGGGCCGCGGCGCTGCGGCTGGCCGCGCTGTCGCT 

GCGCGGCGGGGGCGACGCGACCCAACTCCTGCGCGGACTTTCCGGCGCCAGTGACGTGATCC 

ACGAATTCCTGAGCGAAAACGTGCTGGACACCCTGGAACCCGAACTGCGCGAATTCCTACTGGT 

GGCATCGGTCACCGAACGCACGTGCGGCGGGCTGGCCTCGGCGCTGGCCGGGATCACCAATG 

GGCGGGCGATGCTGGAAGAGGCCGAGCACCGCGGCTTGTTCCTGCAACGGACCGAAGACGAC 

CCGAATTGGTTTCGCTTCCACCAAATGTTCGCCGACTTTCTCCACCGTCGCCTCGAACGTGGCG 

GGTGGCACCGGGTGGCGGAACTGCACCGCAGGGCATCGGCCTGGTTCGCCGAGAACGGCTAC 

CTGCACGAAGCCGTCGACCATGCACTGGCCGCGGGCGATCCCGCGCGCGCCGTCGATCTTGT 

CGAGCAGGATGAAACGAACCTGCCGGAGCAGTCAAAGATGACCACACTTCTGGCAATCGTGCA 

GAAACTGCCGACGTCGATGGTGGTTTCACGGGCCCGGCTCCAACTCGCCATCGCGTGGGCGAA 

CATTCTGCTGCAACGGCCGGCGCCGGCCACCGGTGCCCTGAATCGTTTCGAAACGGCCCTTGG 

CCGGGCCGAGCTTCCCGAGGCGACGCAGGCGGATCTGCGGGCCGAGGCAGACGTGTTGCGG 

GCGGTCGCCGAGGTGTTCGCAGACCGGGTCGAGCGCGTGGATGACCTTCTCGCCGAGGCAAT 

GTCGAGACCGGACACCCTGCCCCCGGGAGTCCCCGGGACCGCCGGCAACACCGCGGCGTTGG 

CCGCGATCTGCCGCTTCGAGTTCGCCGAGGTATATCCACTGCTGGACTGGGCCGCGCCCTACC 

AGGAAATGATGGGACCGTTCGGCACCGTTTATGCGCAGTGGTTGCGCGGCATGGCGGCCAGGA 
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ATCGGCTCGACATTGTCGCTGCGCTACAGAACTTCCGAACGGCGTTCGAGGTCGGCACGGCAG 

TGGGGGCCCACTCGCACGCGGCGCGGCTTGCGGGTTCGCTGCTCGCCGAATTGCTCTACGAG 

ACCGGCGATCTGGCCGGGGCTGGTCGTCTCATGGACGAGAGCTATCTGCTGGGTTCCGAGGG 

GGGTGCAGTGGACTACCTGGCCGCCAGGTACGTGATCGGCGCGCGGGTCAAGGCGGCCCAGG 

GGGATCATGAGGGTGCGGCTGATCGCCTGTCCACCGGAGGCGATACTGCCGTCCAGCTGGGG 

CTGCCGCGCCTGGCTGCCCGAATCAACAACGAGCGGATCCGGCTGGGCATCGCGCTACCTGC 

GGCGGTGGCCGCCGATTTGCTGGCACCCCGCACCATCCCCCGCGACAATGGAATCGCCACCAT 

GACAGCCGAACTCGACGAGGACTCCGCGGTGCGCCTGTTGTCCGCCGGCGACTCCGCCGATC 

GTGACCAAGCCTGCCAACGGGCCGGTGCTCTCGCCGCCGCCATCGACGGTACGCGCAGACCG 

CTGGCGGCGCTGCAGGCGCAAATACTTCATATCGAAACGCTTGCCGCCACCGGACGGGAATCC 

GATGCGCGAAACGAACTGGCGCCGGTAGCCACGAAGTGCGCCGAACTCGGGCTGTCACGTCT 

GCTGGTCGATGCGGGACTGGCCTAA 

>Rv3106 fprA adrenodoxin and NADPH ferredoxin reductase TB.seq 3474004:3475371 

MW:49342 >emb|AL123456|MTBH37RV:3474004-3475374, fprA SEQ ID NO:120 

ATGCGTCCCTATTACATCGCCATCGTGGGCTCCGGGCCGTCGGCGTTCTTCGCCGCGGCATCC 

TTGCTGAAGGCCGCCGACACGACCGAGGACCTCGACATGGCCGTCGACATGCTGGAGATGTTG 

CCGACTCCCTGGGGGCTGGTGCGCTCCGGGGTCGCGCCGGATCACCCCAAGATCAAGTCGAT 

CAGCAAGCAATTCGAAAAGACGGCCGAGGACCCCCGCTTCCGCTTCTTCGGCAATGTGGTCGT 

CGGCGAACACGTCCAGCCCGGCGAGCTCTCCGAGCGCTACGACGCCGTGATCTACGCCGTCG 

GCGCGCAGTCCGATCGCATGTTGAACATCCCCGGTGAGGACCTGCCGGGCAGTATCGCCGCX: 

GTCGATTTCGTCGGCTGGTACAACGCACATCCACACTTCGAGCAGGTATCACCCGATCTGTCGG 

GCGCCCGGGCCGTAGTTATCGGCAATGGAAACGTCGCGCTAGACGTGGCACGGATTCTGCTCA 

CCGATCCCGACGTGTTGGCACGCACCGATATCGCCGATCACGCTTTGGAATCGCTACGCCCAC 

GCGGTATCCAGGAGGTGGTGATCGTCGGGCGCCGAGGTCCGCTGCAGGCCGCGTTCACCACG 

TTGGAGTTGCGCGAGCTGGCCGACCTCGACGGGGTTGACGTGGTGATCGATCCGGCGGAGCT 

GGACGGCATTACCGACGAGGACGCGGCCGCGGTGGGCAAGGTCTGCAAGCAGAACATCAAGG 

TGCTGCGTGGCTATGCGGACCGCGAACCCCGCCCGGGACACCGCCGCATGGTGTTCCGGTTCT 

TGACCTCTCCGATCGAGATCAAGGGCAAGCGCAAAGTGGAGCGGATCGTGCTGGGCCGCAACG 

AGCTGGTCTC:CGACGGCAGCGGGCGAGTGGCGGCCAAGGACACCGGCGAGCGCGAGGAGCT 

GCCAGCTCAGCTGGTCGTGCGGTCGGTCGGCTACCGCGGGGTGCCCACGCCCGGGCTGCCGT 

TCGACGACCAGAGCGGGACCATCGCCAACGTCGGCGGCCGAATCAACGGCAGCCCCAACGAAT 

ACGTCGTCGGGTGGATCAAGCGCGGGCCGACCGGGGTGATCGGGACCAACAAGAAGGACX3CC 

CAAGACACCGTCGACACCTTGATCAAGAATCTTGGCAACGCCAAGGAGGGCGCCGAGTGCAAG 

AGCTTTCCGGAAGATCATGCCGACCAGGTGGCCGACTGGCTAGCAGCACGCCAGCCGAAGCTG 

GTCACGTCGGCCCACTGGCAGGTGATCGACGCTTTCGAGCGGGCCGCCGGCGAGCCGCACGG 

GCGTCCCCGGGTCAAGTTGGCCAGCCTGGCCGAGCTGTTGCGGATTGGGCTCGGCTGA 
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>Rv3235 -TB.seq 3611296:3611934 MW:22659 >embIAL123456|MTBH37RV:361 1295-361 1937. 
Rv3235 SEQIDNO:121 

ATGATGGCCAGCAACCAAACCGCTGCGCAACACTCGTCTGCCACTCTCCAGCAGGCTCCTCGTT 

CGATCGATGATGCTGGAGGGTGCCCCTTGACCATCAGTCCTATCGCGAACTCACCGGGCGACA 

CCTTCGCCGTCACACCCGTCGTCGAGTACGAGCCGCCGCCGCGAAACATCCXJGCCGTGCGGG 

CAATCATCGCACGCAGCCCGGCGGCCGCACACCCCGCAGCTAGCTCGCCGACAACCAATCAGG 

CCGAGCGGCCGGGCACCGGCAGCGGTCACCTCCACGGCCAAGTCACCGCGGCTGCGTCAAGC 

GGGGACCTTCGCCGATGCCGCGCTACGCCGAGTGCTGGAGGTCATCGACCGCCGCCGCCCGG 

TGGGCCAGCTGCGCCCCCTGCTGGCACCCGGCCTCGTCGACTCCGTGCTCGCGGTGAGCCGC 

ACGGCGGCCGGACACCAACAAGGCGCGGCCATGCTGCGCCGCATCCGGCTGACACCGGCCGG 

ACCCGACACCGCGGACACCGCCGCCGAGGTCTtCGGCACCTACAGTCGCGGGGACCGGATCC 

ATGCGATCGCCTGCCGGGTGGAACAACGGCCCGCCGGTAACGAAACCCGATGGCTGATGGTC 

GCCCTGCACATCGGGTGA 

>Rv3255c manA mannose-6-phosphate isomerase TB.seq 3635040:3636263 MW:43340 
>emb|AL123456|MTBH37RV:c3636263-3635037. manA SEQ ID NO: 122 

GTGGAACTGCTACGTGGCGCGTTACGCACCTACGCTTGGGGATCGCGCACCGCTATCGCCGAA 
TTCACCGGGCGTCCGGTGCCGGCCGCTCACXJCCGAGGCCGAACTATGGTTCGGTGCACACCC 
GGGTGATCCGGCTTGGCTGCAGACGCCGCATGGCCAAACCTCGTTGCTCGAAGCGTTGGTCGC 
GGATCCGGAGGGGCAGCTCGGCTCCGCGTCGCGCGCGCGATTCGGCGATGTGTTGCCGTTCT 
TGGTCAAGGTGTTGGCGGCCGACGAGCCACTATCGTTGCAGGCCCATCCGAGCGCCGAGCAG 
GCGGTTGAGGGCTACCTGCGGGAAGAGCGAATGGGCATTCCGGTGTCCTCACCCGTCX^GCAAC 
TACCGCGACACCAGTCACAAGCCAGAGTTATTGGTGGCGCTGCAGCCGTTCGAGGCGCTGGCC 
GGATTCCGGGAGGCGGCTCGCACCACCGAGCTGCTGCGGGCGCTGGCCGTATCCGACCTCGA 
CCCGTTCATCGACTTGCTGAGCGAGGGGTCCGATGCCGATGGTTTGCGTGCGCTGTTCACCAC 
CTGGATTACCGCACCCCAGCCCGACATCGACGTGCTGGTGCCTGCCGTGCTGGACGGCGCTAT 
CCAGTACGTCAGCTCCGGCGCAACGGAATTTGGCGCCGAAGCCAAGACAGTGCTGGAACTCGG 
CGAACGTTATCCCGGCGACGCCGGTGTGCTGGCGGCGTTGTTGCTCAACCGCATCAGCTTGGC 
TCCTGGGGAGGCGATCTTCCTGCCGGCCGGCAACCTGCACGCCTATGTGCGTGGTTTCGGTGT 
GGAAGTGATGGCCAACTCCGACAACGTGTTACGCGGTGGACTTACCCCTAAGCACGTCGATGT 
GCCCGAGTTGTTGCGGGTGCTGGACTTCGCCCCCACGCCGAAGGCTCGGCTGCGGCCCCCGA 
TCCGGCGCGAGGGGCTGGGGCTGGTCTTTGAGACGCCCACCGATGAGTTCGGGGCCACGCTA 
CTGGTGCTCGACGGCGATCACCTCGGCCACGAGGTCGACGCGTCGTCCGGCCATGACGGTCC 
ACAGATCTTGTTATGCACCGAGGGTTCGGCGACGGTGCACGGGAAGTGCGGGTCGCTCACGCT 
ACAGCGCGGCACGGCCGCCTGGGTGGCGGCCGACGACGGCCCGATCCGGCTGACCGCCGGG 
CAACCCGCCAAGCTGTTCAGGGCGACCGTCGGGTTGTGA 
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>Rv3264c rTnlA2 glucose-1 -phosphate thymidyltransferase TB.seq 3644897:3545973 MW:37840 
>emb|AL123456|MTBH37RV:c3645973-3644894. nnlA2 SEQ ID NO:123 

TTGGCAACTCACCAAGTCGATGCGGTGGTCCTGGTCGGTGGCAAGGGTACCCGACTGCGGCCG 
5 TTGACGCTGTCGGCGCCCAAGCCAATGCTGCCTACCGCCGGACTGCCGTTCCTCACCCATCTG 
CTGTCGCGGATCGCCGCAGCGGGCATCGAGCACGTGATCCTGGGTACGTCCTACAAACCCGCA 
GTCTTCGAAGCGGAGTTCGGCGACGGGTCCGCACTGGGCCTACAGATCGAATACGTGACCGAG 
GAGCATCCCTTGGGGACTGGCGGCGGCATCGCCAACGTTGCCGGCAAGCTGCGCAACGACAC 
CGCGATGGTGTTTAACGGCGATGTGCTCTCGGGCGCGGATCTGGCCCAACTGCTGGACTTCCA 

10 CCGAAGCAATCGAGCCGATGTCACGCTGCAACTGGTGCGGGTGGGCGACCCGCGGGCATTCG 
GCTGCGTACCCACCGACGAGGAGGACCGCGTAGTCGCCTTTCTGGAGAAGACGGAGGATCCG 
CCGACCGACCAGATCAATGCCGGCTGCTATGTCTTCGAACGCAACGTCATCGACCGGATTCCGC 
AGGGCCGGGAGGTTTCGGTGGAACGCGAGGTGTTCCCGGCCTTGGTCGCCGACGGCGAGTGC 
AAGATCTACGGCTATGTCGATGCCAGCTATTGGCGGGACATGGGCACACCGGAAGACTTCGTTC 

15 GCGGATCGGCGGATCTGGTGCGCGGCATCGCCCCGTCTCCGGCCTTGCGTGGTCACCGCGGT 
GAGCAGTTGGTGCACGACGGTGCGGCGGTATCTCCCGGTGCGTTGCTGATTGGCGGCACCGTC 
GTGGGGCGTGGTGCCGAAATCGGCCCCGGCACCAGATTGGACGGCGCGGTCATCTTCGATGG 
TGTCCGGGTGGAGGCCGGGTGCGTGATCGAGCGTTCGATCATCGGCTTCGGTGCTCGCATCGG 
ACCGCGGGCGTTGATCCGCGACGGTGTGATCGGTGACGGGGCCGACATCGGCGCGCGCTGCG 

20 AGTTGTTAAGTGGTGCCCGGGTATGGCCCGGTGTCTTTCTTCCCGACGGCGGGATCCGTTACTC 
GTCCGACGTTTGA 

>Rv3368c- TB.seq 3780334:3780975 MW:23734 >emb|AL123456|MTBH37RV:c3780975-3780331. 
Rv3368c SEQIDNO:124 

25 ATGACCCTCAACCTGTCCGTCGACGAGGTCCTGACCACTACCCGCTCGGTGCGCAAGCGTCTC 
GATTTCGACAAGCCGGTGCCACGCGACGTGCTGATGGAATGCCTCGAGCTGGCGCTGCAGGCG 
CCCACCGGTTCCAATTCCCAAGGCTGGCAGTGGGTGTTCGTCGAGGACGCCGCCAAGAAAAAG 
GCGATCGCCGACGTCTACCTGGCCAACGCCCGGGGCTACCTCAGCGGGCCGGCGCCGGAGTA 
CCCCGACGGCGACACCCGCGGCGAGCGGATGGGGCGGGTCCGCGATTCGGCGACCTATCTCG 

30 CCGAACACATGCACXJGGGCGCCGGTGCTGCTGATCCCGTGCCTGAAAGGCCGGGAAGACGAG 
TCGGCGGTGGGTGGCGTGTCGTTTTGGGCCTCACTGTTCCCGGCGGTGTGGAGCTTCTGCCTG 
GCGCTGCGCTCCXX3CGGGCTGGGTTCGTGCTGGACGACGCTGCACCTGCTCGACAACGGCGA 
GCACAAGGTGGCCGACGTGCTCGGCATTCCCTACGACGAATACAGCCAAGGCGGGCTGCTTCC 
GATCGCCTACACACAAGGCATCGACTTCCGGCCGGCCAAGCGGCTGCCGGCCGAGAGCGTGA 

35 CGCACTGGAACGGCTGGTAA 
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>Rv3382c fytBI TB.seq 3796447:3797433 MW:34667 >emb|AL123456|MTBH37RV:c3797433- 
3796444. lytB SEQ ID NO:125 

ATGGCTGAGGTGTTCGTGGGACCGGTCGCACAGGGATACGCTTCGGGTGAAGTCACGGTGCTG 
TTGGCGTCGCCGCGGTCGTTTTGCGCCGGTGTAGAGCGTGCTATCGAGACGGTCAAGCGAGTG 
5 CTTGACGTGGCCGAAGGCCGGGTGTATGTGCGCAAGCAAATCGTGCACAACACTGTTGTGGTT 
GCCGAGTTGCGGGACCGGGGAGCAGTGTTCGTCGAGGATCTCGACGAGATTCCCGATCCGCC 
GCCGCCGGGGGCGGTCGTGGTGTTCTCCGCGCATGGGGTTTCCCCGGCGGTGCGCGCGGGC 
GCTGATGAGCGGGGAGTGCAGGTCGTCGACGCGACCTGCCCACTGGTGGCGAAAGTCCACGC 

tgaagccgcacggtttgccgcgcgcggtgacacggtggtcttcatcgggcacgccggacatg 
10 aggagacggaaggcacgcttggcgtcgctccgcggtcaacattattggtgcagacacxx:gctg 
atgtggcagcgttgaacctgcccgagggtacccagctatcgtatctgacccagacaaccctgg 
cacttgatgaaactgccgatgtcattgatgcgctgcgcgcgaggtttccgacgttgggccaacc 
cccctctgaagacatctgctatgccaccacgaacagacagcgtgcgctgcaatcgatggtcggt 
gaatgtgacgttgtgttggtgattggctcgtgcaattcgtcgaattcgcggcgtctggtcgagt 
15 tggcgcagcgaagtgggacgccggcctacttgattgacgggcctgatgacattgagcccgaat 
ggctgtcgtcggtctcgacgatcggtgtcaccgcgggagcctcggcgccgccacgactggtg 
gggcaggtgattgatgcacttcgcggatacgcctcgatcaccgtggtggaacgctcgatagcg 
accgagacggtgcgattcggccttcccaaacaggttcgcgcgcaatga 

20 >Rv3418c groES 10 kD chaperone TB.seq 3836985:3837284 MW:10773 
>emb|AL123456|MTBH37RV:c3837284-3836982, groES SEQ ID NO:126 

gtggcgaaggtgaacatcaagccactcgaggacaagattctcgtgcaggccaacgaggccgag 
accacgaccgcgtccggtctggtcattcctgacaccgccaaggagaagccgcaggagggcac 
cgtcgttgccgtcggccctgggcggtgggacgaggacggcgagaagcggatcccgctggacg 
25 ttgcggagggtgacaccgtcatctacagcaagtacggcggcaccgagatcaagtacaacggcg 
aggaatacctgatcctgtcggcacgcgacgtgctggccgtcgtttccaagtag 

>Rv3423c air TB.seq 38401 93:384141 6 MW:43357 
>emb|AL123456|MTBH37RV:c3841416-3840190, air SEQ ID NO:127 

30 gtgaaacggttctgggagaatgtcggaaagccaaacgacacgacagatgggcggggcacgact 
tcgttgggcatgacaccgatatcccagacacctggcctcctcgccgaggccatggtggatctg 
ggcgctattgaacacaacgtgcgggtgctgggtgagcacgccggccacgcgcagctgatggc 
ggtggtcaaggccgacggctacggtcacggtggtacgcgcgtcgcccaaaccgccctgggag 
ccggtgcggccgaactcggcgtcgccaccgtcgacgaggcgctagcgctgcgcgctgatggc 

35 attaccgcaccggtgctggcctggctgcatccgcccggcatcgacttcgggcccgcgctgctg 
gccgacgtgcaggtcgcggtgtcctcgctgcgccaactcgacgaactgttgcacgcggtgcg 
ccggaccggccggacggcgacggtgaccgtcaaggtggataccgggctgaaccgcaatggcg 
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TGGGACCGGCACAATTCCCGGCCATGCTGACCGCGTTACGCCAAGCCATGGCCGAGGACGCC 

GTCCGGCTGCGGGGGCTGATGTCGCATATGGTTTACGCCGACAAGCCTGACGATTCCATCAAC 

GATGTTCAGGCCCAACGGTTTACCGCCTTTCTGGCGCAGGCCCGCGAACAAGGGGTGCGGTTC 

GAGGTGGCGCATCTATCGAACTCATCAGCAACTATGGCGCGCCCCGACCTGACGTTCGACCTG 

GTGCGGCCGGGCATCGCGGTGTATGGGCTAAGCCCXSGTACCCGCCCTCGGTGACATGGGGCT 

GGTGCCGGCGATGACCGTGAAATGTGCTGTTGCGCTGGTGAAATCGATTCGTGCGGGGGAGGG 

CGTGTCGTATGGGCACACATGGATCGCGCCACGCGACACCAATCTGGCGCTGCTGCCGATCGG 

TTACGCAGACGGCGTGTTCCGGTCGCTGGGCGGGCGGCTGGAGGTGCTGATCAACGGCAGAC 

GATGCCCCGGTGTGGGGCGGATCTGCATGGACCAGTTCATGGTCGACCTGGGCCCCGGGCGG 

CTTGATGTGGCCGAAGGCGACGAGGCGATTTTGTTCGGGCCGGGCATCCGGGGTGAGCCCAC 

GGCTCAGGACTGGGCCGATCTTGTCGGCACCATCCACTACGAAGTGGTCACCAGCCCGCGAGG 

ACGTATCACCAGGACCTATCGCGAGGCTGAAAACCGTTGA 

>Rv3490 otsA [alpha],-trehalose-phosphate synthase TB.seq 3908232:3909731 MW:55864 
>emb|AL123456|MTBH37RV:3908232-3909734, otsA SEQ ID NO:128 

ATGGCTCCCTCGGGAGGCCAGGAGGCGCAGATTTGCGATTCGGAGACCTTCGGGGACTCTGAC 

TTCGTGGTGGTAGCCAATGGACTGCCCGTCGATCTGGAGCGTCTTCCCGACGGCAGCACAACC 

TGGAAACGCAGCCCCGGAGGCTTGGTCACCGCCTTGGAGCCGGTGCTGCGGCGTCGGCGCGG 

GGCCTGGGTCGGCTGGCCCGGCGTTAACGACGACGGGGCCGAACCCGACCTCCACGTGCTGG 

ACGGCCCCATCATCCAAGACGAGCTGGAACTTCATCCGGTACGGCTGAGCACCACGGACATAG 

CTCAGTACTACGAGGGATTCTCCAACGCCACACTGTGGCCGCTGTACCACGACGTCATCGTCAA 

GCCGCTCTACCACCGCGAATGGTGGGATCGCTACGTCGACGTCAACCAGCGCTTTGCCGAGGC 

CGCGTCGCGCGCCGCCGCCCACGGCGCAACCGTGTGGGTACAGGACTACCAGCTGCAGCTGG 

TACCGAAGATGCTGCGCATGCTGCGGCCCGATCTGACCATCGGTTTCTTTTTGCACATCCCGTT 

CCCGCCGGTAGAGCTGTTTATGCAGATGCCGTGGCGCACCGAGATCATCCAGGGCCTACTGGG 

CGCCGACCTGGTGGGCTTCCATCTTCCGGGCGGTGCCCAGAATTTCCTGATCCTGTCCCGGCG 

TCTGGTCGGCACCGACACTTCCCGCGGAACCGTCGGTGTGCGGTCGCGGTTCGGTGCGGCGG 

TGCTCGGGTCCCGCACCATACGAGTTGGCGCCTTTCCTATCTCGGTTGACTCCGGCGCGCTCG 

ACCACGCTGCCCGCGACCGCAACATCAGGCGCCGGGCCCGCGAGATTCGCACCGAACTGGGA 

AATCCGGGCAAGATCCTGCTCGGTGTTGACCGGCTCGACTACACCAAGGGCATCGACGTACGG 

CTGAAGGCCTTTTCCGAGCTGCTGGCCGAGGGCCGCGTCAAACGCGACGACACCGTCGTGGTC 

CAGCTGGCTACCCCGAGCCGCGAGCGGGTGGAGAGCTACCAGACGCTGCGCAACGACATCGA 

ACGCCAGGTCGGCCACATTAAGGGCGAGTACGGTGAGGTTGGCCATCCGGTAGTGCATTACCT 

GCATCGACCGGCTCCGCGCGACGAGCTTATCGGTTTCTTCGTGGCCAGCGACGTCATGCTGGT 

CACCCCAGTACGCGACGGGATGAACCTGGTGGCCAAGGAGTACGTCGCTTGCCGCAGCGATCT 

TGGCGGTGCCCTGGTGCTCAGCGAATTCACCGGGGCCGCAGCCGAACTCCGGCACGCATACCT 

GGTCAACCCGCACGACCTGGAAGGCGTCAAGGACGGGATAGAGGAAGCGCTCAACCAGACGG 
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AGGAGGCGGGCCGGCGGCGAATGCGGTCGCTGCGACGCCAAGTGCTCGCCCACGACGTGGA 
CCGCTGGGCACAGTCGTTTCTCGACGCTCTCGCCGGGGCACACCCGAGGGGCCAAGGCTAA 

>Rv3598c lysS lysyHRNA synthase TB.seq 4041423:4042937 MW:55678 
>emb|AL123456|MTBH37RV:c4042937-4041420, fysS SEQ ID NO:129 

GTGAGTGCCGCTGACACAGCAGAAGACCTTCCTGAGCAGTTCCGGATTCGCCGGGACAAGCGC 

GCTCGCTTGCTGGCCCAGGGGCGCGATCCCTATCCCGTCGCGGTGCCGCGCACTCACACGTTG 

GCCGAGGTTCGCGCCGCCCACCCTGACTTGCCGATCGATACCGCGACCGAAGACATCGTCGGG 

GTCGCGGGCCGAGTGATCTTTGCGCGCAACTCGGGAAAGCTATGCTTTGCGACACTTCAGGAC 

GGCGATGGTACCCAGCTGCAAGTGATGATCAGCCTCGACAAGGTCGGCCAGGCTGCTCTCGAC 

GCATGGAAAGCCGATGTCGACCTGGGCGACATCGTCTACGTGCATGGCGCGGTGATCAGTTCG 

CGCCGCGGCGAGCTGTCCGTCCTGGCGGATTGCTGGCGGATCGCCGCCAAGTCGCTGCGGCC 

GCTTCCCGTCGCGCACAAAGAGATGAGTGAAGAGTCGCGGGTTCGTCAGCGCTATGTTGACCT 

CATAGTTCGACGGGAAGCGCGCGCGGTGGCTCGACTACGGATCGCCGTCGTCCGCGCGATCC 

GGACGGCGCTTCAACGTCGTGGGTTCCTGGAAGTCGAGACGCCCGTCTTGCAGACGTTAGCCG 

GTGGTGCGGCGGCCCGTCCGTTCGCCACTCATTCCAATGCCGTAGACATCGATCTGTACCTGCG 

GATCGCGCCGGAACTGTTCCTCAAGCGCTGCATCGTGGGTGGTTTCGACAAGGTCTTCGAACTT 

AATCGAGTGTTCCGAAACGAAGGAGCCGATTCCACGCATTCTCCGGAATTCTCCATGCTGGAGA 

CCTACCAGACCTACGGAACCTATGACGATTCGGCAGTCGTCACCCGGGAGCTTATTCAAGAGGT 

GGCCGATGAGGCGATCGGAACCAGACAACTGCCGTTGCCCGACGGCAGTGTCTATGACATCGA 

CGGAGAATGGGCGACTATACAAATGTACCCGTCGCTGTCTGTGGCGCTCGGTGAAGAGATCAC 

ACCGCAGACGACGGTCGATCGCTTACGTGGGATCGCCGATAGCCTTGGCCTGGAGAAAGACCC 

AGCGATTCATGACAACCGTGGCTTCGGCCACGGCAAACTCATCGAGGAACTCTGGGAGCGCAC 

AGTGGGCAAGAGCTTGAGCGCACCCACATTTGTCAAGGATTTTCCGGTTCAGACAACGCCTTTG 

ACCCGTCAGCACCGCAGTATCCGCGGCGTAACCGAGAAGTGGGACCTCTATCTGCGCGGAATC 

GAAC7TGCCACCGGCTACTCGGAATTAAGCGACCCGGTAGTCCAGCGGGAGAGATTCGCCGAC 

CAGGCCCGTGCCGCGGCCGCTGGCGATGACGAAGCGATGGTGCTTGACGAGGATTTTCTGGCC 

GCTCTGGAGTACGGCATGCCACCGTGCACCGGAACCGGAATGGGTATCGATCGGTTGTTGATG 

TCTTTGACTGGGTTGTCAATTAGGGAGACAGTTTTGTTCCCGATTGTTCGACCACACTCCAACTG 

A 

>Rv3600c - similar to Bacillus subtilis protein YacB TB.seq 4043041 :4043856 MW:29274 
>emb|AL123456|MTBH37RV:c4043856-4043038, Rv3600c SEQ ID NO: 130 

GTGCTGCTGGCGATTGACGTCCGCAACACCCACACCGTTGTGGGCCTGCTGTCCGGAATGAAA 
GAGCACGCAAAGGTCGTGCAGCAGTGGCGGATACGCACCGAATCCGAAGTCACCGCCGACGAA 
CTGGCACTGACGATCGACGGGCTGATCGGCGAGGATTCCGAGCGGCTCACCGGTACCGCCGC 
CTTGTCCACGGTCCCGTCCGTGCTGCACGAGGTGCGGATAATGCTCGACCAGTACTGGCCGTC 
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GGTGCCGCACGTGCTGATCGAGCCCGGAGTACGCACCGGGATCCCTTTGCTCGTCGACAACCC 
GAAGGAAGTGGGCGCAGACCGCATCGTGAACTGTTTGGCCGCCTATGACCGGTTCCGGAAGGC 
CGCCATCGTCGTTGACTTTGGATCCTCGATCTGTGTTGATGTTGTATCGGCCAAGGGTGAATTTC 
TTGGCGGCGCCATCGCGCCCGGGGTGCAGGTGTCTTCCGATGCCGCGGCGGCCCGCTCGGCG 
5 GCATTGCGCCGCGTTGAACTTGCCCG(XCACGTTCGGTGGTTGGCAAGAACACCX3TCGAATGC 
ATGCAAGCCGGTGCGGTGTTCGGCTTCGCCGGGCTGGTAGACGGGTTGGTAGGCCGCATCCG 
CGAGGACGTGTCCGGTTTCTCCGTCGACCACGATGTCGCGATCGTGGCTACCGGGCATACCGC 
GCCCCTGCTGCTGCCGGAATTGCACACCGTCGACCATTACGACCAGCACCTGACCTTGCAGGG 
TCTGCGGCTGGTGTTCGAGCGTAACCTCGAAGTCCAGCGCGGCCGGCTCAAGACGGCGCGCT 
10 GA 



>Rv3606c folK 7,8-dihydro-6>hydroxymethylpterin pyrophosphokinase TB.seq 
4048181:4048744 MW:20732 >emb|AL123456|MTBH37RV:c4048744-4048178. foiK 
SEQ ID NO: 131 

1 5 ATGACGCGGGTAGTGCTCTCGGTTGGCTCCAACCTGGGTGACCGCCTGGCACGATTGCGGTCG 
GTCGCCGACGGTCTCGGCGATGCGTTGATTGCGGCTTCCCCGATATATGAGGCCGACCCCTGG 
GGTGGGGTGGAGCAGGGGCAGTTCCTCAATGCGGTGCTGATCGCCGACGATCCTACCTGCGAA 
CCGCGGGAGTGGCTGCGGCGGGCGCAGGAGTTCGAGCGCGCTGCGGGCAGGGTGCGTGGCC 
AGCGCTGGGGTCCACGAAATCTCGACGTCGACCTGATCGCCTGCTACCAGACCTCGGCCACCG 

20 AGGCTCTGGTCGAAGTGACCGCGCGGGAGAACCACCTCACGCTGCCGCACCCACTGGCGCAT 
CTGCGGGCCTTTGTGTTGATCCCGTGGATTGCCGTCGACCCAACGGCGCAGCTGACGGTTGCC 
GGGTGCCCGCGGCCCGTCACGCGACTGCTGGCCGAGCTGGAGCCCGCCGACCGCGACAGTGT 
GCGGTTGTTTAGGCCGTCGTTCGATCTGAATAGCAGACACCCCGTCAGTCGGGCACCGGAAAG 
CTGA 

25 

>Rv3607c foiX may be involved in folate biosynthesis TB.seq 4048744:4049142 MW:14553 
>emb|AL123456|MTBH37RV:c40491 42-4048741. foIX SEQ ID NO:132 

ATGGCTGACCGAATCGAACTGCGGGGCCTGACCGTGCATGGTCGGCACGGGGTCTACGACCAC 
GAGCGAGTGGCCGGGCAGCGGTTTGTCATCGATGTCACCGTGTGGATAGACCTGGCCGAGGC 
30 CGCCAACAGCGACGACTTGGCCGACACCTATGACTACGTGCGGCTGGCTTCGCGGGCGGCCG 
AGATCGTCGCCGGACCCCCGCGGAAGCTGATCGAAACGGTCGGGGCCGAGATCGCTGATCAC 
GTGATGGACGACCAGCGAGTGCATGCCGTTGAGGTGGCGGTACAGAAGCCGCAGGCGCCCATT 
CCGCAGACGTTCGACGATGTGGCGGTGGTGATCCGACGCTCACGGCGCGGCGGCCGCGGTTG 
GGTAGTCCCX3GCGGGCGGCGCGGTATGA 

35 

>Rv3608c foiP dihydropteroate synthase TB.seq 40491 38:4049977 MW:2881 2 
>emblAL123456|MTBH37RV:c4049977-4049135. folP SEQ ID NO:133 
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GTGAGTCCGGCGCCCGTGCAGGTGATGGGGGTTCTAAACGTCACGGACGACTCTTTCTCGGAC 

GGCGGGTGTTATCTCGATCTCGACGATGCGGTGAAGCACGGTCTGGCGATGGCAGCCGCAGGT 

GCGGGCATCGTCGACGTGGGTGGTGAGTCGAGCCGGCCCGGTGCCACTCGGGTTGACCCGGC 

GGTGGAGACGTCTCGTGTCATACCCGTCGTCAAAGAGCTTGCAGCACAAGGCATCACCGTCAG 

CATCGATACCATGCGCGCGGATGTCGCTCGGGCGGCGTTGCAGAACGGTGCCCAGATGGTCAA 

CGACGTGTCGGGTGGGCGGGCCGATCCGGCGATGGGGCCGCTGTTGGCCGAGGCCGATGTG 

CCGTGGGTGTTGATGCACTGGCGGGCGGTATCGGCCGATACCCCGCATGTGCCTGTGCGCTAC 

GGCAACGTGGTGGCCGAGGTCCGTGCCGACCTGCTGGCCAGCGTCGCCGACGCGGTGGCCGC 

AGGCGTCGACJCCGGCAAGGCTGGTGCTCGATCCCGGGCTTGGATTCGCCAAGACGGCGCAAC 

ATAATTGGGCGATCTTGCATGCCCTTCCGGAACTGGTCGCGACCGGAATCCCAGTGCTGGTGG 

GTGCTTCGCGCAAGCGCTTCCTCGGTGCGTTGTTGGCCGGGCCCGACGGCGTGATGCGGCCA 

ACCGATGGGCGTGACACCGCGACGGCGGTGATTTCCGCGCTGGCCGCACTGCACGGGGCCTG 

GGGTGTGCGGGTGCATGATGTGCGGGCCTCGGTCGATGCCATCAAGGTGGTCGAAGCX3TGGAT 

GGGAGCGGAAAGGATAGAACGCGATGGCTGA 

>Rv3609c fblE GTP cyclohydrolase I TB.seq 4049977:4050582 MW:22395 
>emb|AL1 23456|MTBH37RV:c4050582-4049974, fblE SEQ i D NO: 1 34 

ATGTCGCAGCTGGATTCGCGCAGCGCATCTGCTCGTATCCGTGTGTTCGACCAGCAACGTGCC 

GAGGCCGCGGTGCGCGAATTGCTGTACGCGATCGGCGAGGATCCGGATAGGGACGGCTTGGT 

AGCCACCCXSGTCCCGGGTTGCCCGGTCATACCGCGAAATGTTCGCCGGGCTCTACACCGACCC 

CGACTCGGTGTTGAACACCATGTTCGACGAAGACCACGACGAGCTGGTGTTGGTCAAGGAAATC 

CCTATGTACTCCACCTGCGAACACCACCTGGTGGCGTTCCACGGTGTGGCCCACGTCGGCTAC 

ATCCCGGGCGACGACGGCAGGGTGACCGGCTTGTCAAAGATCGCGCGACTGGTCGATCTGTAC 

GCCAAGCGACCTCAGGTCCAGGAGCGGCTCACCAGTCAGATCGCCGATGCCCTGATGAAAAAA 

CTCGATCCACGCGGGGTAATCGTGGTGATCGAGGCTGAGCATCTGTGCATGGCGATGCGCGGG 

GTTCGCAAGCCCGGCTCGGTCACCACTACGTCGGCGGTGCGCGGACTGTTCAAAACCAATGCC 

GCTTCTCGAGCCGAAGCGCTCGACCTCATTTTGCGGAAGTGA 

>Rv3610c flsH inner membrane protein, chaperone TB.seq 4050601 :4052880 MW:81987 
>emb|AL123456|MTBH37RV:o4052880^050598, ftsH SEQ ID NO: 135 

ATGAACCGGAAAAACGTGACTCGCACCATAACAGCGATCGCCGTCGTGGTGCTGCTCGGCTGG 

TCGTTCTTTTACTTCAGCGACGACACCCGCGGCTACAAGCCCGTTGATACCTCGGTGGCGATAA 

CACAGATCAACGGCGACAACGTCAAGAGCGCACAGATCGACGATCGCGAGCAACAGCTGCGGC 

TGATCCTGAAGAAGGGTAACAACGAGACCGAGGGGTCCGAGAAGGTCATCACCAAGTACCCCA 

CCGGGTACGCCGTCGACCTGTTCAACGCGCTCAGCGCCAAAAACGCGAAGGTCAGCACGGTCG 

TCAACCAGGGCAGCATCCTGGGCGAGCTGCTGGTCTACGTGCTGCCGCTGCTGTTGCTGGTGG 

GGCTGTTCGTGATGTTCTCCCGCATGCAAGGCGGCGCCCGGATGGGCTTCGGGTTCGGCAAGT 
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CACGCGCCAAGCAACTGAGCAAGGACATGCCCAAGACCACCTTCGCCGACGTCGCAGGTGTCG 
ACGAGGCGGTCGAGGAGCTCTACGAGATCAAGGACTTCCTGCAGAACCCCAGCAGGTACCAAG 
CGCTGGGCGCCAAGATCCCCAAAGGCGTGCTGCTCTACGGGCCGCCGGGAACCGGTAAGACG 
TTGCTGGCTCGTGCGGTGGCCGGCGAAGCCGGAGTGCCGTTCTTCACCATCTCCGGCTCCGAC 
5 TTCGTCGAAATGTTCGTCGGCGTCGGCGCATCCCGTGTCAGAGACCTGTTCGAGCAGGCCAAG 
CAGAACAGCCCGTGCATCATCTTCGTCGACGAGATCGACGCCGTCGGCCGACAAAGAGGCGCC 
GGGCTGGGCGGCGGTCACGACGAGCGTGAGCAGACCCTCAACCAGTTGCTAGTCGAAATGGA 
CGGTTTTGGCGATCGCGCCGGCGTCATCCTGATCGCGGCCACCAACCGGCCCGACATCCTGGA 
CCCGGCGCTGTTGCGGCCGGGCCGCTTCGACCGCCAGATCCCGGTATCCAACCCGGATCTGG 

10 CGGGTCGGCGGGCGGTGCTGCGCGTGCACTCCAAGGGCAAGCCGATGGCCGCGGACGCGGA 
CCTCGACGGACTGGCCAAGCGGACCGTCGGCATGACCGGAGCCGACCTGGCCAACGTCATCA 
ACGAGGCGGCGCTGCTGACCGCCCGGGAGAACGGCACCGTCATCACCGGTCCCGCCCTCGAG 
GAAGCGGTGGACCGGGTGATCGGCGGCCCGCGCCGCAAAGGCCGGATCATCAGCGAGCAGGA 
GAAGAAGATCACCGCCTATCACGAGGGCGGGCACACCCTGGCCGCTTGGGCGATGCCCGATAT 

15 CGAGCCGATTTATAAGGTGACGATCCTGGCGCGCGGGCGTACCGGCGGGCACGCGGTGGCGG 
TGCCGGAAGAAGACAAGGGCCTGCGGACCCGCTCGGAAATGATCGCGCAACTGGTGTTCGCGA 
TGGGTGGGCGCGCCGCCGAAGAACTGGTGTTTCGTGAGCCGACCACCGGCGCGGTGTCCGAC 
ATCGAGCAGGCCACCAAGATAGCGCGCTCAATGGTCACCGAATTTGGAATGAGCTCCAAGCTG 
GGCGCGGTCAAATACGGCTCCGAACACGGCGACCCGTTCCTCGGACGTACCATGGGCACCCAG 

20 CCGGACTACTCCCACGAGGTCGCCCGGGAGATCGACGAAGAGGTCCGCAAGCTTATCGAGGCG 
GCGCATACCGAAGCGTGGGAAATCCTGACCGAATACCGCGACGTGCTGGACACTTTGGCCGGC 
GAGCTGCTGGAAAAGGAGACCCTGCACCGACCCGAGCTGGAAAGCATCTTCGCTGACGTCGAA 
AAGCGGCCGCGGGTCACCATGTTCGACGACTTCGGTGGCCGGATCCCGTCGGACAAACCGCCC 
ATCAAGACACCCGGCGAGCTCGCGATCGAACGCGGGGAACCTTGGCCCCAGCCGGTCCCCGA 

25 GCCGGCGTTCAAGGCGGCGATTGCGCAGGCTACCCAAGCCGCTGAGGCCGCCCGGTCCGACG 
GCGGCCAAACCGGGCACeGCGCCAACGGTTCGCCCGCCGGCACCCACCGGTCCGGTGACCGC 
GAGTACGGCTCCACCCAGCCTGACTACGGTGCCCCGGCGGGCTGGCATGCGGCGGGATGGCC 
CCCAAGGTCATCTCATCGGCCCAGCTATAGCGGTGAACCGGCACCGACGTATCCGGGTCAGCC 
CTACCCGACCGGTCAAGCCGATCCGGGTTCCGATGAGTCCTCGGCGGAGCAGGATGACGAGGT 

30 CAGTCGGACCAAGCCGGCCCACGGCTGA 

>Rv3671c - TB.seq 4112322:4113512 MW:40722 >embIAL123456|MTBH37RV:o4113512-4112319. 
Rv3671c SEQIDNO:136 

ATGACCCCGTCGCAGTGGCTGGATATCGCCGTCTTGGCGGTCGCATTTATTGCAGCCATCTCCG 
35 GCTGGCGTGCCGGTGCGCTGGGCTCAATGCTGTCGTTTGGCGGGGTGCTGCTGGGCGCGACA 
GCCGGCGTGCTGCTGGCGCCGCATATCGTCAGTCAAATCAGCGCTCCGCGGGCCAAACTGTTT 
GCCGCGCTGTTCCTGATCCTGGCACTGGTCGTAGTCGGCGAGGTCGCTGGTGTGGTGCTGGGC 
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CGCGCCGTCCGCGGGGCGATCCGTAACCGGCCGATCCGGTTGATCGACTCGGTCATTGGGGTA 
GGGGTGCAGCTGGTCGTGGTGCTCACCGCGGCGTGGTTGTTGGCGATGCCGCTGACACAGTC 
GAAAGAGCAGCCCGAGCTGGCTGCCGCGGTGAAGGGTTCGCGGGTGCTCGCCCGGGTCAACG 
AGGCGGCACCCACCTGGCTGAAGACGGTGCCCAAGCGGCTGTCGGCCCTGCTGAACACCTCC 
5 GGCCTGCCCGCGGTTTTGGAGCCGTTCAGCCGCACGCCGGTCATTCCAGTGGCCTCACCCGAC 
CCAGCGCTGGTCAACAATCCGGTGGTGGCGGCCACCGAGCCAAGTGTCGTCAAAATCCGCAGC 
CTGGCACCCAGATGCCAGAAAGTGTTGGAGGGCACCGGCTTCGTGATCTCACCCGATCGGGTG 
ATGACCAACGCGCACGTGGTGGCCGGATCCAACAACGTCACGGTGTATGCCGGCGACAAGCCC 
TTCGAGGCCACGGTGGTGTCCTACGACCCGTCGGTCGACGTAGCGATCCTGGCCGTTCCGCAC 

1 0 TTGCCGCCGCCGCCGCTGGTCTTCGCTGCGG AGCCGGCGAAAACCGGTGCCGACGTTGTGGT 
GCTGGGTTATCCCGGCGGCGGCAATTTCACTGCCACACCCGCCAGGATTCGCGAGGCCATCAG 
ACTCAGTGGCCCCGATATTTACGGGGACCCGGAGCCGGTTACCCGCGACGTGTACACCATCAG 
AGCCGATGTGGAGCAAGGTGATTCGGGTGGGCCCCTGATCGACCTCAACGGTCAGGTGCTCGG 
TGTGGTGTTCGGCGCAGCCATCGACGACGCCGAAACTGGGTTTGTGCTGACGGGCGGCGAGGT 

1 5 GGCGGGGCAGCTTGCCAAAATCGGTGCTACCCAACCGGTCGGCACCGGGGCCTGCGTCAGCT 
GA 

>Rv3682 ponA2 TB.seq 4121913:4124342 MW:84637 
>emb|AL123456|MTBH37RV:4121913-4124345. ponA' SEQ ID NO:137 

20 ATGCCCGAGCGCCTCCCGGCCGCGATCACCGTTCTGAAGCTGGCTGGGTGCTGTCTGTTGGCC 
AGTGTCGTCGCCACTGCGCTGACGTTCCCGTTCGCAGGCGGGCTAGGGCTGATGTGCAATCGT 
GCCTCTGAGGTCGTTGCCAACGGCTCGGCCCAGCTGCTCGAGGGGCAAGTGCCTGCGGTATCG 
ACGATGGTCGACGCGAAGGGCAACACGATCGCGTGGCTGTACTCGCAGCGCCGGTTCGAGGT 
GCCCTCGGACAAGATCGCCAACACGATGAAGCTGGCGATCGTCTCGATTGAAGATAAGCGGTTC 

25 GCCGACCACAGCGGCGTGGACTGGAAGGGCACCCTGACCGGCCTGGCGGGCTACGCGTCCG 
GCGACGTCGACACGCGCGGCGGCTCGACGCTCGAACAACAGTACGTGAAGAACTACCAACTGC 
TGGTGACAGCCCAAACCGATGCCGAGAAGCGAGCGGCCGTCGAAACCACTCCGGCCCGCAAG 
CTTCGCGAGATCCGGATGGCACTCACGCTGGACAAGACCTTCACAAAATCTGAAATCCTGACCC 
GATACTTGAACCTGGTCTCGTTCGGCAATAACTCGTTCGGCGTGCAGGACGCGGCGCAAACGTA 

30 CTTCGGCATCAACGCGTCCGACCTGAATTGGCAGCAAGCGGCGCTGCTGGCCGGCATGGTGCA 
ATCGACCAGCACGCTCAACCCGTACACCAACCCCGACGGCGCGCTGGCCCGGCGGAACGTGG 
TCCTCGACACCATGATCGAGAACCTTCCCGGGGAGGCGGAGGCGTTGCGTGCCGCCAAGGCC 
GAGCCGCTGGGGGTACTGCCGCAGCCCAATGAGTTGCCGCGCGGCTGCATCGCGGCCGGCGA 
CCGCGCATTCtTCTGCGACTACGTCCAGGAGTACCTGTCTCGGGCCGGGATCAGCAAGGAGCA 

35 GGTCGCCACGGGCGGGTACCTGATCCGCACCACCCTGGACCCAGAGGTGCAGGCACCGGTCA 
AGGCCGCCATCGACAAGTACGCCAGCCCGAACCTGGCCGGTATTTCCAGCGTGATGAGCGTGA 
TCAAACCGGGTAAGGATGCGCACAAGGTGTTGGCCATGGCCAGTAACCGCAAATACGGGCTGG 
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ATCTAGAAGCCGGCGAAACCATGCX3GCCGCAGCCATTCTCCCTGGTTGGCGACGGCX3CCGGGT 

CTATCTTCAAGATCTTCACCACGGCCGCTGCTCTGGACATGGGCATGGGTATTAACGCCCAACT 

CGACGTGCCGCCCCGATTCCAGGCCAAAGGTCTGGGAAGTGGCGGGGCAAAGGGGTGCCCCA 

AAGAGACCTGGTGTGTGGTGAACGCCGGCAACTACCGCGGCTCGATGAATGTCACCGACGCGC 

TGGCAACCTCGCCAAACACCGCGTTCGCCAAGCTGATCTCGCAGGTCGGGGTGGGGCGTGCG 

GTCGATATGGCCATCAAACTCGGGCTGAGGTCTTATGCGAATCCCGGCACCGCACGCGACTAC 

AACCCCGACAGCAATGAGAGCTTGGCTGACTTCGTCAAACGACAGAACCTGGGTTCGTTCACCC 

TCGGCCCCATCGAGTTAAAGGCGCTGGAGCTGTCCAACGTGGCGGCCACGTTGGCATCCGGCG 

GCGTGTGGTGCCCCCGCAACCCAATCGACCAGCTCATCGACCGCAACGGCAACGAAGTCGCGG 

TCACCACCGAGACGTGCGACCAGGTGGTGCCCGCAGGGCTGGCGAACACCCTCGCCAACGCG 

ATGAGCAAGGACGCCGTGGGCAGCGGCACGGCGGCCGGTTCGGCCGGCGCGGCGGGCTGGG 

ATCTGCGGATGTCCGGCAAAACCGGCACCACCGAGGCGCACCGGTCGGCCGGCTTCGTGGGC 

TTCACCAACCGCTACGCGGGGGCGAACTACATCTACGACGACTCCAGCTCGCCGACAGATCTGT 

GTTCCGGGCCGCTGCGCCATTGCGGCAGCGGCGACTTGTACGGCGGCAACGAGCCATCCCGC 

ACCTGGTTCGCCGCGATGAAGCCGATCGCCAACAACTTCGGCGAAGTGCAGCTACCACCGACC 

GATCCACGCTATGTCGACGGCGCACCAGGCTCACGGGTACCAAGCGTGGCCGGTCTGGATGTC 

GACGCCGCACGCCAGCGCCTCAAGGACGCGGGCTTCCAGGTCGCCGACCAAACCAACTCGGT 

CAACAGCTCCGCCAAGTATGGTGAGGTGGTCGGAACGTCGCGCAGCGGTCAAACAATTCCGGG 

TTCGATCGTCACGATCCAGATCAGCAACGGCATCCCGCCGGCTCCGCCTCCGCCACCGCTGCC 

TGAGGATGGTGGGCCGCCACCGCCGGTCGGATCGCAGGTGGTGGAGATTCCGGGGCTGCCGC 

CGATCACCATTCCGCTGCTGGCGCCACCACCCCCAGCGCGTCCCCCGTAG 

>Rv3721c dnaZX DNA polymerase lll,[gamma] (dnaZ) and t (dnaX) TB.seq 4164995:4166728 

MW:61892 >emb|AL1 23456|MTBH37RV:c41 66728-41 64992, dnaZX SEQ ID NO:138 

GTGGCTCTCTACCGCAAGTACCGACCGGGAAGCTTCGCGGAGGTGGTGGGGCAGGAGCACGT 

CACCGCGCCGCTGTCGGTGGCGCTGGATGCCGGCCGGATCAACCACGCGTACCTGTTCTCTGG 

GCCGCGTGGCTGCGGAAAGACGTCGTCAGCGCGTATCCTGGCGCGGTCGTTGAACTGTGCGCA 

GGGCCCTACCGCCAAGCCGTGCGGGGTCTGCGAATCCTGCGTTTCGTTGGCGCCCAACGCCCC 

CGGCAGCATCGACGTGGTAGAGGTGGATGCCGCCAGCCACGGCGGCGTGGACGACACCCGCG 

AGCTGCGGGACCGCGCGTTCTATGCGCCGGTCCAGTCACGGTACCGGGTATTTATCGTCGACG 

AGGCGCACATGGTGACCACCGCGGGATTCAACGCGCTGCTCAAGATCGTGGAGGAACCGGGC 

GAACACCTGATCTTCATATTCGCCACCACCGAACCGGAGAAGGTACTGCCGACGATTCGGTCGC 

GCACTCATCACTACCCGTTCCGGCTGCTGCCGCCGCGCACTATGCGGGCGTTGCTCGCGCGGA 

TCTGCGAGCAGGAGGGCGTCGTCGTCGACGATGCGGTGTACCCGTTGGTGATCCGGGCCGGC 

GGAGGTTCCCCACGGGATACGCTCTCGGTGCTGGACCAATTGCTGGCTGGGGCCGCGGACAC 

CCACGTGACCTACACCCGGGCGCTGGGGCTGCTGGGTGTCACCGACGTCGGCCTGATCGACG 

ACGCGGTCGACGCACTGGCCGCTTGCGATGCGGCCGCATTGTTCGGGGCGATCGAATCGGTGA 
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TCGATGGCGGACATGACCCTCGGCGTTTCGCTACCGATCTGCTGGAGCGATTCCGCGACCTGA 
TTGTGCTGCAATCGGTTCCCGACGCGGCATCTCGCGGGGTGGTGGATGCGCCCGAAGACGCG 
CTGGATCGGATGCGCGAGCAAGCCGCCCGGATCGGGCGGGCGACCCTGACCCGATATGCCGA 
GGTGGTGCAGGCCGGGCTAGGCGAGATGCGCGGTGCGACCGCGCCGCGTCTGCTGCTGGAA 
5 GTGGTTTGCGCGCGACTGCTGCTGCCCTCGGCGAGCGACGCCGAATCGGCACTGTTGCAGCG 
GGTCGAACGGATCGAGACCCGGTTGGACATGTCGATCCCGGCGCCGCAAGCCGTACCACGCC 
CGTCGGCTGCGGCTGCCGAGCCGAAACACCAGCCCGCGCGTGAACCGAGACCGGTGCTGGCC 
CCCACACCGGCCTCGAGCGAACCCACCGTGGCCGCGGTTCGGTCCATGTGGCCGACGGTGCG 
CGACAAGGTGCGCCTGCGCAGCCGTACCACCGAGGTGATGCTGGCGGGTGCCACCGTCCGTG 

10 CGCTAGAGGACAACACGCTGGTGCTGACCCACGAATCGGCGCCGCTGGCGCGGCGGCTGTCC 
GAACAGCGCAACGCCGATGTCCTCGCCGAGGCGCTTAAAGACGCGCTGGGAGTCAACTGGCG 
GGTGCGGTGTGAGACCGGTGAACCGGCTGCGGCGGCATCACCCGTGGGCGGGGGAGCGAAC 
GTGGCGACCGCCAAGGCCGTAAACCCTGCCCCCACAGCGAATTCCACTCAGCGCGACGAAGAG 
GAGCACATGCTCGCCGAAGCCGGCCGTGGCGACCCGTCGCCGCGTCGCGACCCGGAAGAGGT 

15 TGCACTCGAGCTGCTGCAGAACGAGCTGGGCGCGCGCCGGATAGACAACGCCTAG 

>Rv3783 - TB.seq 4229255:4230094 MW:32337 

>emb|AL123456|MTBH37RV:4229255-4230097. Rv3783 SEQ ID NO:139 

ATGACATTCATGGATGCTCAAGCTAGCTTCCAGACACAGTCGCGGACACTGGCCCGCGTCCGA 
20 GGCGATCTGGTCGACGGGTTCCGCCGCCACGAGCTGTGGCTGCACCTGGGCTGGCAGGACAT 
CAAGCAGCGGTACCGCCGCTCGGTGCTGGGGCCGTTCTGGATCACCATCGCCACCGGAACGA 
CCGCCGTCGCGATGGGCGGCCTGTATTCCAAGCTGTTTCGGCTCGAGCTGTCTGAGCACCTGC 
CCTACGTCACGCTCGGGCTGATCGTCTGGAACCTGATCAACGCCGCCATCCTGGACGGCGCAG 
AGGTTTTCGTCGCCAACGAAGGTCTGATCAAACAGCTGCCGGCACCGTTGAGCGTGCACGTCTA 
25 TCGGTTGGTGTGGCGGCAGATGATCTTCTTCGCCCACAACATCGTCATCTACTTCGTCATCGGG 
ATCATCTTTCCTAAGCCGTGGTCGTGGGCX3GATCTGTCGTTTCTTCCGGCGCTGGCGCTCATTT 
TCCTCAATTGCGTTTGGGTGTCACTGTGTTTCGGCATCCTGGCGACCCGCTACCGCGACATCGG 
CCCGCTGCTGTTTTCCGTTGTGCAGTTGTTGTTCTTCATGACGCCGATCATCTGGAACGACGAGA 
CCCTGCGTCGGCAGGGCGCGGGCCGCTGGTCGAGCATCGTCGAGCTCAACCCGCTGCTGCAC 
30 TATCTGGACATCGTGCGGGCGCCACTGTTGGGCGCTCACCAGGAGCTGCGGCAGTGGCTGGTG 
GTGCTGGTGTTGACCGTCGTCGGCTGGATGCTGGCGGCGTTCGCGATGCGGCAGTATCGCGC 
GCGGGTGCCCTACTGGGTGTAG 

>Rv3789 - TB.seq 4235371:4235733 MW:1 3378 
35 >emb|AL123456|MTBH37RV:4235371-4235736, Rv3789 SEQ ID NO:140 

ATGCGGTTCGTTGTCACCGGCGGCCTCGCTGGGATAGTTGACTTTGGCCTCTACGTCGTGCTGT 
ACAAGGTGGCGGGCCTACAGGTCGACCTGTCCAAGGCCATCAGCTTCATCGTCGGCACCATCA 
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CCGCGTACCTGATCAACCGCXJGGTGGACATTCCAGGCCGAGCCCAGCACGGCCCGATTCGTCG 
CGGTCATGCTCGTCTACGGAATCACCTTGGCCGTGCAGGTCGGACTCAACCACCTCTGCCTCGC 
ACTCTTGCACTACCGGGCGTGGGCCATCCCCGTCGCGTTTGTGATCGCGCAGGGCACCGCCAC 
GGTAATCAACTTCATCGTGCAGCGAGCCGTGATCTTCCGGATCCGCTGA 

>Rv3790 - TB.seq 4235776:42371 58 MW:501 64 

>emb|AL123456|MTBH37RV:4235776-4237161. Rv3790 SEQ ID NO:141 

ATGTTGAGCGTGGGAGCTACCACTACCGCCACCCGGCTGACCGGGTGGGGGCGCACAGCGCC 

GTCGGTGGCGAATGTGCTTCGCACCCCAGATGCCGAGATGATCGTCAAGGCGGTGGCTCGGGT 

CGCCGAGTCGGGGGGCGGCCGGGGTGCTATCGCGCGCGGGCTGGGCCGCTCCTATGGGGAC 

AACGCCCAAAACGGCGGTGGGTTGGTGATCGACATGACGCCGCTGAACACTATCCACTCCATTG 

ACGCCGAGACCAAGCTGGTCGACATCGACGCCGGGGTCAACCTCGACCAACTGATGAAAGCCG 

CCCTGCCGTTCGGGCTGTGGGTCCCGGTGCTGCCGGGAACCCGGCAGGTCACCGTCGGCGGG 

GCGATCGCCTGCGATATCCACGGCAAGAACCATCACAGCGCTGGCAGCTTCGGTAACCACGTG 

CGCAGCATGGACCTGCTGACCGCCGACGGCGAGATCCGTCATCTCACTCCGACCGGCGAGGA 

CGCCGAACTGTTCTGGGCCACCGTCGGGGGCAACGGTCTCACCGGCATCATCATGCGGGCCAC 

CATCGAGATGACGCCCACTTCGACGGCGTACTTCATCGCCGACGGCGACGTCACCGCCAGCCT 

CGACGAGACCATCGCCCTGCACAGCGACGGCAGCGAAGCGCGCTACACCTATTCGAGTGCCTG 

GTTCGACGCGATCAGGGCTCCCCCGAAGCTGGGCCGCGCGGCGGTATCGCGTGGCCGCCTGG 

CCACCGTCGAGCAATTGCCTGCGAAACTGCGGAGCGAACCTTTGAAATTCGATGCGCCACAGCT 

ACTTACGTTGCCCGACGTGTTTCCCAACGGGCTGGCCAACAAATATACCTTCGGCCCGATCGGC 

GAACTGTGGTACCGCAAATCCGGCACCTATCGCGGCAAGGTCCAGAACCTCACGCAGTTCTACC 

ATCCGCTGGACATGTTCGGCGAATGGAACCGCGCCTACGGCCCAGCGGGCTTCCTGCAATATC 

AGTTCGTGATCCCCACAGAGGCGGTTGATGAGTTCAAGAAGATCATCGGCGTTATTCAAGCCTC 

GGGTGACTACTCGTTTCTCAACGTGTTCAAGCTGTTCGGCCCCCGCAACCAGGCGCCGCTCAGC 

TTCCCCATCCCGGGCTGGAACATCTGCGTCGACTTCCCCATCAAGGACGGGCTGGGGAAGTTC 

GTCAGCGAACTCGACCGCCGGGTACTGGAATTCGGCGGCCGGGTCTACACCGCCAAAGACTCC 

CGTACCACCGCCGAAACCTTTCATGCCATGTATCCGCGCGTCGACGAATGGATCTCCGTGCGCC 

GCAAGGTCGATCCGCTGGGCGTATTCGCCTCCGACATGGCCCGACGCTTGGAGCTGCTGTAG 

>Rv3791 - TB.seq 4237162:4237923 MW:27470 

>emb|AL123456|MTBH37RV:42371 62-4237926, Rv3791 SEQ ID NO:142 

ATGGTTGTTGATGCCGTAGGAAACCCCCAGACGGTGCTGCTGCTCGGTGGCACCTCCGAGATC 

GGGCTCGCGATCTGCGAGCGCTAGCTGCACAATTCGGCGGCCCGCATCGTGCTGGGCTGCCTG 

CCCGACGACCCACGGCGGGAGGACGCGGCCGCTGCGATGAAGCAGGCCGGCGCGCGGTCGG 

TGGAGCTGATCGACTTTGACGCCCTGGATACCGACAGCCACCCGAAGATGATCGAGGCGGCCT 

TCTCCGGGGGTGATGTGGACGTGGCTATCGTCGCGTTCGGCTTGCTCGGCGACGCGGAAGAGG 
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TGTGGCAGAACCAGCGCAAGGCGGTGCAGATCGCCGAAATCAACTACACCGCAGCGGTTTCGG 
TGGGCGTGCTGCTGGCTGAGAAGATGCGCGCTCAGGGCTTCGGTCAGATCATCGCGATGAGCT 
CGGCCGCCGGTGAGCGGGTGCGACGGGCGAACTTCGTCTACGGCTCCACCAAGGCCGGTCTG 
GACGGGTTTTACCTGGGGTTGTCAGAAGCGCTGCGCGAGTACGGTGTTCGTGTGCTGGTGATC 
5 CGGCCCGGCCAGGTGCGTACCCGGATGAGCGCGCACCTCAAGGAAGCTCCATTGACCGTCGA 
CAAGGAGTACGTCGCCAACCTCGCGGTGACCGCGTCCGCAAAAGGTAAGGAATTGGTTTGGGC 
GCCAGCAGCGTTCCGCTACGTCATGATGGTGTTGCGTCACATCCCGCGGAGCATCTTCCGCAA 
GCTGCCCATCTGA 

10 >Rv3794 embA TB.seq 4243230:424651 1 MW:115694 

>emb|AL123456|lvnrBH37RV:4243230-4246514, embA SEQ ID NO:143 

gtgccccacgacggtaatgagcgatctcaccggatcgcacgcctagcagccgtcgtctgggga 
atcgcgggtctgctgctgtgcggcatcgttccgctgcttccggtgaaccaaaccaccgcgacc 
atcttctggccgcagggcagcaccgcx:gacggcaacatcacccagatcaccgcccctctggta 
15 tccggggcgccacgcgggctggacatctcgatcccctgctcggccatcgccacgctgcccgc 
caacggcggcctggtgctgtccacactgccggccggtggcgtggataccggtaaggcggggc 

TGTTCGTCCGCGCCAACCAGGACACGGTCGTCGTGGCGTTCCGCGACTCGGTGGCGGCGGTG 

gcggcccgctccacgatcgcagcgggaggctgtagcgcgctgcatatctgggccgataccgg 
cggcgcggggggtgattttatgggtatacccggcggcgccgggaccctgccgccggagaaga 

20 AGCCACAGGTTGGCGGCATCTTCACCGACCTGAAGGTCGGAGCGCAGCCCGGGCTGTCGGCC 
CGCGTCGACATCGACACTCGGTTTATCACGACGCCCGGCGCGCTCAAGAAGGCCGTGATGCTC 
CTCGGCGTGCTGGCGGTCCTGGTAGCCATGGTGGGGCTGGCCGCGCTGGACCGGCTCAGCAG 
GGGCCGCACCCTGCGCGACTGGCTGACCCGATATCGCCCGCGGGTGCGGGTCGGATTCGCCA 
GCCGGCTCGCTGACGCAGCGGTGATCGCGACCTTGTTGCTCTGGCATGTCATCGGCGCCACCT 

25 CGTCCGATGACGGCTACCTTCTGACCGTCGCCCGGGTCGCCCCGAAGGCCGGCTATGTAGCCA 
ACTACTACCGGTATTTCGGCACGACGGAGGCGCCGTTCGACTGGTATACATCGGTGCTTGCCCA 
GCTGGCGGCGGTGAGCACCGCCGGCGTCTGGATGCGCCTGCCCGCCACCCTGGCCGGAATCG 
CCTGCTGGCTGATCGTCAGCCGTTTCGTGCTGCGGCGGCTGGGACCGGGCCCGGGCGGGCTG 
GCGTCCAACCGGGTCGCTGTGTTCACCGCTGGTGCGGTGTTCCTGTCCGCCTGGCTGCCGTTC 

30 AACAACGGCCTGCGTCCCGAGCCGCTGATCGCGCTGGGTGTGCTGGTCACGTGGGTGTTGGTG 
GAACGGTCGATCGCGCTCGGACGGCTGGCCCCGGCCGCGGTAGCCATCATCGTGGCGACGCT 
TACCGCGACGCTGGCACCGCAGGGGTTGATCGCGCTGGCCCCGCTGCTGACTGGTGCGCGCG 
CCATCGCCCAGAGGATCCGGCGCCGCCGGGCGACCGATGGACTGCTGGCGCCGCTGGCGGT 
GCTGGCCGCGGCGTTGTCGCTGATCACCGTGGTGGTGTTTCGGGACCAGACGCTGGCCACGGT 

35 GGCCGAATCGGCACGCATCAAGTACAAGGTCGGCCCGACCATCGCCTGGTACCAGGACTTCCT 
GCGCTACTACTTCCTTACCGTGGAGAGCAACGTTGAGGGGTCGATGTCCCGCCGGTTCGCGGT 
GCTGGTGTTGCTGTTCTGCCTGTTCGGGGTGCTGTTCGTGCTGCTGCGGCGCGGCCGGGTGGC 
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GGGGCTGGCCAGCGGCCCGGCCTGGCGACTGATCGGCACTACGGCGGTCGGCCTGCTGCTGC 
TCACGTTCACGCCAACCAAGTGGGCCGTGCAGTTCGGCGCATTCGCGGGGCTGGCCGGGGTGT 
TGGGTGCGGTCACCGCGTTCACCTTTGCCCGCATCGGTGTACATAGTCGACGCAACCTCACGCT 
GTACGTGACCGCGTTGCTGTTCGTGCTGGCGTGGGCAACCTCGGGCATCAACGGGTGGTTCTA 
5 CGTCGGCAACTACGGGGTGCCGTGGTATGACATCCAGCXJCGTCATCGCCAGCCACCCGGTGAC 
GTCGATGTTTCTGACGCTGTCGATCCTCACCGGATTGCTGGCAGCCTGGTATCACTTCCGGATG 
GACTACGCCGGGCACACCGAAGTCAAAGACAACCGGCGCAACCGCATCTTGGCCTCTACGCCA 
CTGCTGGTGGTCGCGGTGATCATGGTCGCAGGCGAAGTCGGCTCGATGGCCAAGGCCGCGGT 
GTTCCGTTACCCGCTTTACACCACCGCCAAGGCCAACCTGACCGCGCTCAGCACCGGGCTGTC 

10 CAGGTGTGCGATGGCCGACGACGTGCTGGCCGAGCCCGACCCCAATGCCGGCATGCTGCAAC 
CGGTTCCGGGCCAGGCGTTCGGACCGGACGGACCGCTGGGCGGTATCAGTCCCGTCGGCTTC 
AAACCCGAGGGCGTGGGCGAGGACCTCAAGTCCGACCCGGTGGTCTCCAAACCCGGGCTGGT 
CAACTCCGATGCGTCGCCCAACAAACCCAACGCCGCCATCACCGACTCCGCGGGCACCGCCGG 
AGGGAAGGGCCCGGTCGGGATCAACGGGTCGCACGCGGCGCTGCCGTTCGGATTGGACCCGG 

15 CACGTACCCCGGTGATGGGCAGCTACGGGGAGAACAACCTGGCCGCCACGGCCACCTCGGCC 
TGGTACCAGTTACCGCCCCGCAGCCCGGACCGGCCGGTGGTGGTGGTTTCCGCGGCCGGCGC 
CATCTGGTCCTACAAGGAGGAGGGCGATTTCATCTACGGCCAGTCCCTGAAACTGCAGTGGGG 
CGTCACCGGCCCGGACGGCCGGATCCAGCCACTGGGGCAGGTATTTCCGATCGACATCGGACC 
GCAACCCGCGTGGCGCAATCTGCGGTTTCCGCTGGCCTGGGCGCCGCCGGAGGCCGACGTGG 

20 CGCGCATTGTCGCCTATGACCCGAACCTGAGCCCTGAGCAATGGTTCGCCTTCACCCCGCCCC 
GGGTTCCGGTGCTGGAATCTCTGGAGCGGTTGATCGGGTCAGCGACACCGGTGTTGATGGACA 
TCGCGACCGCAGCCAACTTCCCCTGCCAGCGACCGTTTTCCGAGCATCTCGGCATTGCCGAGC 
TTCCGCAGTACCGGATCCTGCCGGACCACAAGCAGACGGCGGCGTCGTCGAACCTATGGCAGT 
CCAGCTCGACCGGCGGTCCGTTCCTGTTCACCCAGGCGCTGCTGCGCACCTCGACGATCGCCA 

25 CGTACCTGCGTGGGGACTGGTATCGCGACTGGGGATCGGTGGAGCAGTACCACCGGCTGGTG 
CCGGCCGATCAGGCTCCAGACGCCGTTGTCGAGGAGGGCGTGATCACTGTGCCCGGCTGGGG 
TCGGCCAGGACCGATCAGGGCGCTGCCATGA 

>Rv3795 embB TB.seq 4246511:4249804 MW: 11 8023 

30 >emb|AL123456|MTBH37RV:424651 1-4249807, embB SEQ ID NO:144 

ATGACACAGTGCGCGAGCAGACGCAAAAGCACCCCAAATCGGGCGATTTTGGGGGCTTTTGCG 
TCTGCTCGCGGGACGCGCTGGGTGGCCACCATCGCCGGGCTGATTGGCTTTGTGTTGTCGGTG 
GCGACGCCGCTGCTGCCCGTCGTGCAGACCACCGCGATGCTCGACTGGCCACAGCGGGGGCA 
ACTGGGCAGCGTGACCGCCCCGCTGATCTCGCTGACGCCGGTCGACTTTACCGCCACCGTGCC 

35 GTGCGACGTGGTGCGCGCCATGCCACCCGCGGGCGGGGTGGTGCTGGGCACCGCACCCAAG 
CAAGGCAAGGACGCCAATTTGCAGGCGTTGTTCGTCGTCGTCAGCGCCCAGCGCGTGGACGTC 
ACCGACCGCAACGTGGTGATCTTGTCCGTGCCGCGCGAGCAGGTGACGTCCCCGCAGTGTCAA 
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CGCATCGAGGTCACCTCTACCCACGCCGGCACXJTTCGCCAACTTCGTCGGGCTCAAGGACCCG 
TCGGGCGCGCCGCTGCGCAGCGGCTTCCCCGACCCCAACCTGCGCCCGCAGATTGTCGGGGT 
GTTCACCGACCTGACCGGGCCCGCGCCGCCCGGGCTGGCGGTCTCGGCGACCATCGACACCC 
GGTTCTCCACCCGGCCGACCACGCTGAAACTGCTGGCGATCATCGGGGCGATCGTGGCCACCG 
5 TCGTCGCACTGATCGCGTTGTGGCGCCTGGACCAGTTGGACGGGCGGGGCTCAATTGCCCAGC 
TCCTCCTCAGGCCGTTCCGGCCTGCATCGTCGCCGGGCGGCATGCGCCGGCTGATTCCGGCAA 
GCTGGCGCACCTTCACCCTGACCGACGCCGTGGTGATATTCGGCTTCCTGCTGTGGCATGTCAT 
CGGCGCGAATTCGTCGGACGACGGCTACATCCTGGGCATGGCCCGAGTCGCCGACCACGCCG 
GCTACATGTCCAACTATTTCCGCTGGTTCGGCAGCCCGGAGGATCCCTTCGGCTGGTATTACAA 

10 CCTGCTGGCGCTGATGACCCATGTCAGCGACGCCAGTCTGTGGATGCGCCTGCCAGACCTGGC 
CGCCGGGCTAGTGTGCTGGCTGCTGCTGTCGCGTGAGGTGCTGCCCCGCCTCGGGCCGGCGG 
TGGAGGCCAGCAAACCCGCCTACTGGGCGGCGGCCATGGTCTTGCTGACCGCGTGGATGCCG 
TTCAACAACGGCCTGCGGCCGGAGGGCATCATCGCGCTCGGCTCGCTGGTCACCTATGTGCTG 
ATCGAGCGGTCCATGCGGTACAGCCGGCTCACACCGGCGGCGCTGGCCGTCGTTACCGCCGC 

15 ATTCACACTGGGTGTGCAGCCCACCGGCCTGATCGCGGTGGCCGCGCTGGTGGCCGGCGGCC 
GCCCGATGCTGCGGATCTTGGTGCGCCGTCATCGCCTGGTCGGCACGTTGCGGTTGGTGTCGC 
CGATGCTGGCCGCCGGCACCGTCATCCTGACCGTGGTGTTCGCCGACCAGACCCTGTCAACGG 
TGTTGGAAGCCACCAGGGTTCGCGCCAAAATCGGGCCGAGCCAGGCGTGGTATACCGAGAACC 
TGCGTTACTACTACCTCATCCTGCCCACCGTCGACGGTTCGCTGTCGCGGCGCTTCGGC I I I I I 

20 GATCACCGCGCTATGCCTGTTCACCGCGGTGTTCATCATGTTGCGGCGCAAGCGAATTCCCAGC 
GTGGCCCGCGGACCGGCGTGGCGGCTGATGGGCGTCATCTTCGGCACCATGTTCTTCCTGATG 
TTCACGCCCACCAAGTGGGTGCACCACTTCGGGCTGTTCGCCGCCGTAGGGGCGGCGATGGC 
CGCGCTGACGACGGTGTTGGTATCCCCATCGGTGCTGCGCTGGTCGCGCAACCGGATGGCGTT 
CCTGGCGGCGTTATTCTTCCTGCTGGCGTTGTGTTGGGCCACCACCAACGGCTGGTGGTATGTC 

25 TCCAGCTACGGTGTGCCGTTCAACAGCGCGATGCCGAAGATCGACGGGATCACAGTCAGCACA 
AT C I I I I IC GCCCTGTTTGCGATCGCCGCCGGCTATGCGGCCTGGCTGCACTTCGCGCCCCGC 
GGCGCCGGCGAAGGGCGGCTGATCCGCGCGCTGACGACAGCCCCGGTACCGATCGTGGCCG 
GTTTCATGGCGGCGGTGTTCGTCGCGTCCATGGTGGCCGGGATCGTGCGACAGTACCCGACCT 
ACTCCAACGGCTGGTCCAACGTGCGGGCGTTTGTCGGCGGCTGCGGACTGGCCGACGACGTA 

30 CTCGTCGAGCCTGATACCAATGCGGGTTTCATGAAGCCGCTGGACGGCGATTCGGGTTCTTGG 
GGCCCCTTGGGCCCGCTGGGTGGAGTCAACCCGGTCGGCTTCACGCCCAACGGCGTACCGGA 
ACACACGGTGGCCGAGGCGATCGTGATGAAACCCAACCAGCCCGGCACCGACTACGACTGGGA 
TGCGCCGACCAAGCTGACGAGTCCTGGCATCAATGGTTCTACGGTGCCGCTGCCXJTATGGGCT 
CGATCCCGCCCGGGTACCGTTGGCAGGCACCTACACCACCGGCGCACAGCAACAGAGCACACT 

35 CGTCTCGGCGTGGTATCTCCTGCCTAAGCCGGACGACGGGCATCCGCTGGTCGTGGTGACCGC 
CGCGGGCAAGATCGCCGGCAACAGCGTGCTGCACGGGTACACCCCCGGGCAGACTGTGGTGC 
TCGAATACGCCATGCCGGGACCCGGAGCX3CTGGTACCCGCCGGGCGGATGGTGCCCGACGAC 
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CTATACGGAGAGCAGCCCAAGGCGTGGCGCAACCTGCGCTTCGCCCGAGCAAAGATGCCCGC 
CGATGCCGTCGCGGTCCGGGTGGTGGCCGAGGATCTGTCGCTGACACCGGAGGACTGGATCG 
CGGTGACCCCGCCGCGGGTACCGGACCTGCGCTCACTGCAGGAATATGTGGGCTCGACGCAG 
CCGGTGCTGCTGGACTGGGCGGTCGGTTTGGCCTTCCCGTGCCAGCAGCCGATGCTGCACGC 
5 CAATGGCATCGCCGAAATCCCGAAGTTCCGCATCACACCGGACTACTCGGCTAAGAAGCTGGAC 
ACCGACACGTGGGAAGACGGCACTAACGGCGGCCTGCTCGGGATCACCGACCTGTTGCTGCG 
GGCCCACGTCATGGCCACCTACCTGTCCCGCGACTGGGCCCGCGATTGGGGTTCCCTGCGCAA 
GTTCGACACCCTGGTCGATGCCCCTCCCGCCCAGCTCGAGTTGGGCACCGCGACCCGCAGCG 
GCCTGTGGTCACCGGGCAAGATCCGAATTGGTCCATAG 

10 

>Rv3834c serS seryl-tRNA synthase TB.seq 4307655:430891 1 MW:45293 
>emb|AL123456|MTBH37RV:c430891 1^307652. serS SEQ ID NO:145 

GTGATCGACCTGAAGCTGCTTCGTGAAAACCCCGACGCGGTACGCCGCTCAGAACTCAGCCGC 
GGCGAGGACCCGGCGCTGGTAGATGCCCTGCTGACGGCCGACGCCGCCCGCCGGGCCGTGA 

1 5 TCTCGACCGCCGATTCGTTACGGGCCGAGCAGAAAGCCGCCAGCAAAAGCGTGGGTGGCGCG 
TCTCCCGAAGAGCGCCCGCCGCTGCTGCGGCGCGCGAAGGAACTCGCCGAGCAGGTCAAAGC 
CGCTGAGGCCGACGAGGTCGAAGCGGAGGCGGCGTTCACCGCGGCGCACCTGGCGATCTCGA 
ATGTCATCGTGGACGGGGTACCCGCCGGCGGGGAGGACGACTACGCGGTGCTCGACGTCGTC 
GGCGAGCCCAGCTACCTCGAGAACCCCAAGGACCACCTGGAGCTCGGCGAGTCGCTGGGCCT 

20 GATCGACATGCAGCGCGGCGCCAAGGTGTCGGGTTCACGGTTCTACTTCCTGACCGGTCGGGG 
TGCCCTACTGCAGCTTGGATTGCTGCAGCTGGCGCTGAAGCTAGCCGTCGACAACGGCTTTGTC 
CGTACGATCCCGCCGGTGCTGGTGCGCCCGGAAGTGATGGTAGGCACGGGATTTCTAGGCGCC 
CACGCCGAGGAGGTGTACCGGGTAGAGGGCGACGGCCTCTACCTTGTGGGCACCTCCGAGGT 
ACCGCTGGCGGGGTATCACTCCGGCGAGATTCTGGACCTTTCCCGCGGGCCGCTGCGGTATGC 

25 GGGCTGGTCGTCGTGTTTCCGACGTGAGGCCGGCAGCCATGGCAAGGACACGCGCGGCATCA 
TCCGGGTGCACCAGTTCGACAAAGTCGAGGGCTTCGTCTACTGCACACCGGCCGACGCGGAGC 
ACGAACATGAGCGGCTGCTGGGCTGGCAGCGCCAGATGCTGGCACGCATCGAGGTGCCGTAT 
CGGGTCATCGACGTGGCCGCGGGTGATCTCGGCTCGTCGGCCGCCCGCAAGTTCGACTGCGA 
GGCGTGGATTCCGACGCAGGGGGCCTATCGCGAGCTGACGTCGACGTCGAACTGCACCACCTT 

30 TCAGGCGCGCCGGTTGGCGACCCGCTACCGGGATGCCAGCGGCAAGCCGCAGATCGCGGCCA 
CCCTCAACGGAACGCTGGCCACCACCCGGTGGCTGGTTGCGATCCTGGAGAACCACCAGCGG 
CCCGACGGCAGCGTTAGAGTCCCGGACGCACTGGTTCCGTTCGTGGGTGTCGAAGTGCTGGAG 
CCGGTCGCTTAG 

35 >Rv3907cpcnA polynucleotide polymerase TB.seq 4391631:4393070 MW:53057 
>emb|AL123456|MTBH37RV:c4393070-4391628. pcnA SEQ ID NO:146 
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GTGCCGGAAGCCGTCCAGGAAGCCGATCTGCTAACCGCCGCTGCGGTTGCCTTGAACAGGCAT 
GCTGCCTTATTGCGGGAACTCGGGTCGGTGTTCGCCGCCGCGGGACACGAGTTGTATCTGGTC 
GGCGGTTCGGTGCGAGATGCACTGTTGGGCCGGTTGAGCCCCGACCTGGACTTCACCACCGAC 
GCCCGTCCCGAGCGGGTGCAGGAGATCGTGCGGCCGTGGGCCGATGCGGTGTGGGATACCG 
5 GAATCGAATTCGGCACCGTCGGCGTGGGTAAGAGCGACCACCGCATGGAGATCACCACATTCC 
GTGCCGACAGCTACGACCGGGTTTCGCGTCATCCAGAGGTACGTTTCGGCGATTGCCTCGAGG 
GCGATCTGGTCCGCCGCGACTTCACCACGAACGCAATGGCTGTGCGCGTCACCGCCACTGGGC 
CGGGCGAATTCCTGGATCCGCTTGGTGGCTTGGCGGCGCTGCGGGCCAAGGTGTTAGACACCC 
CGGCGGCGCCGTCGGGGTCCTTTGGCGACGATCCGTTGCGGATGCTGCGCGCCGCGCGGTTC 

1 0 GTCTCGCAACTTGGATTCGCGGTGGCGCCGCGGGTGCGCGCGGCGATCGAAGAGATGGCGCC 
GCAGTTGGCCCGAATCAGCGCCGAACGGGTGGCCGCCGAGCTGGACAAGCTGCTGGTCGGTG 
AGGATCCGGCCGCGGGTATCGACCTGATGGTGCAGAGCGGTATGGGTGCTGTGGTCTTGCCTG 
AAATCGGTGGGATGCGGATGGCGATCGACGAACATCACCAGCACAAGGACGTCTATCAGCATTC 
CTTGACCGTGCTGCGGCAGGCGATCGCGCTGGAGGACGACGGCCCGGATCTGGTGTTGCGCT 

15 GGGCGGCGCTGCTGCACGACATCGGCAAGCCCGCCACCCGCCGTCACGAACCCGACGGTGGG 
GTGAGCTTCCATCACCACGAAGTGGTCGGCGCCAAGATGGTGCGCAAGCGGATGCGGGCGCT 
GAAGTATTCCAAGCAGATGATCGACGACATCTCGCAGCTGGTCTACCTGCATCTGCGGTTTCAC 
GGCTACGGCGATGGGAAATGGACCGACTCTGCGGTGCGCCGCTATGTCACCGACGCCGGGGC 
CCTACTGCCACGGCTGCACAAGCTGGTGCGCGCCGACTGCACGACCCGCAACAAGCGCCGGG 

20 CCGCGCGGTTGCAGGCCAGTTACGACCGGCTGGAAGAGCGGATCGCGGAGCTGGCCGCCCAG 
GAGGATCTGGATCGGGTGCGCCCCGACCTGGACGGCAACCAGATCATGGCGGTGCTCGACATT 
CCGGCGGGCCCGCAAGTCGGCGAGGCGTGGCGCTACTTGAAGGAGCTGCGGCTAGAGCGCG 
GCCCGTTGTCCACCGAGGAGGCGACAACCGAGCTGCTGTCCTGGTGGAAATCACGGGGGAAC 
CGCTAG 

25 

TABLE 4 

>Rv0002 dnaN DNA polymerase III. b-subunitTB.seq 2052:3257 MW:421 14 SEQ ID NO:147 
MDAATTRVGLTDLTFRLLRESFADAVSWVAKNLPARPAVPVLSGVLLTGSDNGLTISGFDYEVSAEA 
QVGAEIVSPGSVLVSGRLLSDITRALPNKPVDVHVEGNRVALTCGNARFSLPTMPVEDYPTLPTLPEE 
30 TGLLPAELFAEAISQVAIAAGRDDTLPMLTGIRVEILGETWLAATDRFRLAVRELKWSASSPDIEAAVL 
VPAKTLAEAAKAGIGGSDVRLSLGTGPGVGKDGLLGISGNGKRSTTRLLDAEFPKFRQLLPTEHTAVA 
TMDVAELIEAIKLVALVADRGAQVRMEFADGSVRLSAGADDVGRAEEDLWDYAGEPLTIAFNPTYLT 
DGLSSLRSERVSFGFTTAGKPALLRPVSGDDRPVAGLNGNGPFPAVSTDYVYLLMPVRLPG 

35 >Rv0003 recF DNA replication and SOS induction TB.seq 3280:4434 MW:42181 SEQ ID NO:148 
VYVRHLGLRDFRSWACVDLELHPGRTVFVGPNGYGKTNLIEALWYSTTLGSHRVSADLPLIRVGTDR 
AVISTIWNDGRECAVDLEIATGRVNKARLNRSSVRSTRDWGVLRAVLFAPEDLGLVRGDPADRRR 
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YLDDU^VRRPAIAA\mAEYERVLRQRTALlJ<SVPGARYRGDRGVFDTLEVVVDSRLAEHGAELVAARI 
DLVNQLAPEVKKAYQLUVPESRSASIGYRASMDVTGPSEQSDIDRQLl^VARLLAALAARRDAELERG 
VCLVGPHRDDLILRLGDQPAKGFASHGEAWSLAVALRLAAYQLLRVDGGEPVLLLDDVFAELDVMRR 
RALATAAESAEQVLVTAAVLEDIPAGWDARRVHIDVRADDTGSMSWLP 

5 

>Rv0005 gyrB DNA gyrase subunK B TB.seq 5123:7264 MW:78441 SEQ ID NO: 149 
MGKNEARRSALAPDHGTWCDPLRRLNRMHATPEESIRIVAAQKKKAQDEYGAASITILEGLEAVRKR 
PGMYIGSTGERGLHHLIWEWDNAVDEAMAGYATTVhTy/VLLEDGGVEVADDGRGIPVATHAS 
DNA^QLHAGGKFDSDAYAISGGLHGVGVSWNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGA 
10 PTKKTGSTVRFWADPAVFETTEYDFEWARRLQEMAFLNKGLTINLTDERWQDEWDEWSDVA^ 
PKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAG 
YSESVHTFANTINTHEGGTHEEGFRSALTSWNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSE 
PQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKWVNKAVSSAQARIAARKARELVRRK 
SATDIGGLPGKLADCRSTDPRKSELYWEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLK 
15 NTEVQAIITALGTGIHDEFDIGKLRYHKI\/LMADADVDGQHISTll.LTLLFRFMRPLIENGhlVFLAQPPLY 
KLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQ 
VTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV 

>Rv0006 gyrA DNA gyrase subunit A TB.seq 7302:9815 MW:92276 SEQIDNO:150 
MTDTTLPPDDSLDRIEPVDIEQEMQRSYIDYAMSVIVGRALPEVRDGLKPVHRRVLYAMFDSGFRPD 
RSHAKSARSVAETMGNYHPHGDASIYDSLVRMAQPWSLRYPLVDGQGNFGSPGNDPPAAMRYTEA 
RLTPLAMEMLREIDEETVDFIPNYDGRVQEPTVLPSRFPNLLANGSGGIAVGMATNIPPHNLRELADA 
VFWALENHDADEEETLAAVMGRVKGPDFPTAGLIVGSQGTADAYKTGRGSIRMRGWEVEEDSRG 
RTSLVITELPYQVNHDNFITSIAEQVRDGKLAGISNIEDQSSDRVGLRIVIEIKRDAVAKWINNLYKHTQ 
LQTSFGANMLAI\mGVPRTLRLDQLIRYYVDHQLDVl\mRTTYRLRKANERAHILRGLVKALDALDEVI 
ALIRASETVDIARAGLIELLDIDEIQAQAILDMQLRRLAALERQRIIDDLAKIEAEIADLEDILAKPERQRGI 
VRDELAEIVDRHGDDRRTRIIAADGDVSDEDLIAREDWVTITETGYAKRTKTDLYRSQKRGGKGVQG 
AGLKQDDIVAHFFVCSTHDLILFFTTQGRVYRAKAYDLPEASRTARGQHVANLLAFQPEERIAQVIQIR 
GYTDAPYLVLATRNGLVKKSKLTDFDSNRSGGIVAVNLRDNDELVGAVLCSAGDDLLLVSANGQSIR 
FSATDEALRPMGRATSGVQGMRFNIDDRLLSLNWREGTYLLVATSGGYAKRTAIEEYPVQGRGGK 
GVLWMYDRRRGRLVGALIVDDDSELYAVTSGGGVIRTAARQVRKAGRQTKGVRLMNLGEGDTLLAI 
ARNAEESGDDNAVDANGADQTGN 

>Rv0014c pknB serine-threonine protein Idnase TB.seq 15593:17470 MW:6651 1 SEQ ID NO:151 
35 IWTTTPSHLSDRYELGEILGFGGMSEVHLARDLRLHRDVAVKVLRADLARDPSFYLRFRREAQNAAALN 
HPAIVAWDTGEAETPAGPLPYIVMEYVDG\m.RDIVHTEGPMTPKRAIEVlADACQALNFSHQNGIIH 
RDVKPANIMISATNAVKVMDFGIARAIADSGNSVTQTAAVIGTAQYLSPEQARGDSVDARSDVYSLGC 
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VLYEVLTGEPPFTGDSPVSVAYQUVREDPIPPSARHEGLSADLDAWLKALAKNPENRYQTAAEMRA 
DLVRVHNGEPPEAPKVLTDAERTSLLSSAAGNLSGPRTDPLPRQDLDDTDRDRSIGSVGRWVAWA 
VLAVLTWVmAINTFGGITRDVQVPDN^GQSSADAIATLQNRGFKIRTLQKPDSTIPPDHVIGTDPAAN 
TSVSAGDEITVNVSTGPEQREIPDVSTLTYAEAVKKLTAAGFGRFKQANSPSTPELVGKVIGTNPPAN 
5 QTSAITNNA/IIIVGSGPATKDIPDVAGQTVDVAQKNLNVYGFTKFSQASVDSPRPAGEVTGTNPPAGT 
7VPVDSVIELQVSKGNQFVMPDLSGMFWVDAEPRLRALGWTGMLDKGADVDAGGSQHNRWYQN 
PPAGTGVNRDGIITLRFGQ 

>Rv0016c pbpA TB.seq 18762:20234 MW:51577 SEQ ID NO:152 

10 MNASLRraSVTVMALI\A.LU.NATMTQVFTADGLRADPRNQR\/LLDEYSRQRGQITAGGQU^Y 

DGRFRFLRVYPNPEVYAPVTGFYSLRYSSTALERAEDPILNGSDRRLFGRRLADFFTGRDPRGGNV 
DTTINPRIQQAGWDAMQQGCYGPCKGAWALEPSTGKILALVSSPSYDPNLLASHNPEVQAQAWQR 
LGDNPASPLTNRAISETYPPGSTFKVITTAAALAAGATETEQLTAAPTIPLPGSTAQLENYGGAPCGDE 
PTVSLREAFVKSCNTAFVQLGIRTGADALRSMARAFGLDSPPRPTPLQVAES7VGPIPDSAALGMTSI 

15 GQKDVALTPLANAEIAATIANGGITMRPYLVGSLKGPDLANISTTVGYQQRRAVSPQVAAKLTELMVG 
AEKVAQQKGAIPGVQIASKTGTAEHGTDPRHTPPHAWYIAFAPAQAPKVAVAVLVENGADRLSATGG 
ALAAPIGRAVI EAALQGEP 

>Rv0017c rodA TB.seq 20234:21640 MW:50612 SEQ ID NO:153 

20 NiTTTRLCW'VANn-pPLPTRRNAELLLI.CFAAVITFAALLWQANQDQGVPWDLTSYGLAFLTLFGSAHL 
AIRRFAPYTDPLLLPWALLNGLGLVMIHRLDLVDNEIGEHRHPSAN<MMLVyni.VGVAAFALWTFLK 
DHRQLARYGYICGLAGLVFLAVPALLPAALSEQNGAKIWIRLPGFSIQPAEFSKILLLIFFSAVLVAKRG 
LFTSAGKHLLGMTLPRPRDLAPLLAA\WISVGVMVFEKDLGASLLLYTSFL\A/VYLATQRFSVVWIGL 
TLFAAGTLVAYFIFEHVRLRVQTWLDPFADPDGTGYQIVQSLFSFATGGIFGTGLGNGQPDTVPAAST 

25 DFIIAAFGEELGLVGLTAILMLYTIVIIRGLRTAIATRDSFGKLLAAGLSSTLAIQLFIWGGVTRLIPLTGLT 
TPWMSYGGSSLLANYILLAILARISHGARRPLRTRPRNKSPITAAGTEVIERV 

>Rv0018c ppp TB.seq 21640:23181 MW:S3781 SEQ ID NO: 154 

VARVTLVLRYAARSDRGLVRANNEDSVYAGARLLALADGMGGHAAGEVASQLVIAALAHLDDDEPG 
30 GDLLAKLDAAVRAGNSAIAAQVEMEPDLEGMGTTLTAILFAGNRLGLVHIGDSRGYLLRDGELTQITK 
DDTFVQTLVDEGRITPEEAHSHPQRSLIMRALTGHEVEPTLTMREARAGDRYLLCSDGLSDPVSDETI 
LEALQIPEVAESAHRLIELALRGGGPDNVTVAA/ADWDYDYGQTQPILAGAVSGDDDQLTLPNTAAG 
RASAISQRKEIVKRVPPQADTFSRPRWSGRRLAFWALVTVLMTAGLLIGRAIIRSNYYVADYAGSVSI 
MRGIQGSLLGMSLHQPYLMGCLSPRNELSQISYGQSGGPLDCHLMW-EDLRPPERAQVRAGLPAGT 
35 LDDAIGQLRELAANSLLPPCPAPRATSPPGRPAPPTTSETTEPNVTSSPASPSPTTSAPAPTGTTPAIP 
TSASPAAPASPPTPWPVTSSPTMAALPPPPPQPGIDCRAAA 
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>Rv0019c - TB.seq 23273:23737 MW:17153 SEQ ID NO:155 

MQGL\/LQLTRAGFLMLLWVFIWSVLRILKTDIYAPTGAVMMRRGUU.RGTLLGARQRRHAARYLVVT 
EGALTGARITLSEQPVLIGRADDSTLVLTDDYASTRHARLSMRGSEV\ri^DLGSTNGTYLDRAKVT^ 
AVRVPIGTPVRIGKTAIELRP 

S 

>Rv0020c - TB.seq 23864:25444 MW:56881 SEQIDNO:156 

MGSQKRLVQRVERKLEQTVGDAFARIFGGSIVPQEVEALLRREAADGIQSLQGNRLLAPNEYIITLGV 
HDFEKLGADPELKSTGFARDUUDYIQEQGWQTYGDWVRFEQSSNLHTGQFRARGTVNPDVETHP 
PVIDCARPQSNHAFGAEPGVAPMSDNSSYRGGQGQGRPDEYYDDRYARPQEDPRGGPDPQGGS 

10 DPRGGYPPETGGYPPQPGYPRPRHPDQGDYPEQIGYPDQGGYPEQRGYPEQRGYPDQRGYQDQ 
GRGYPDQGQGGYPPPYEQRPPVSPGPAAGYGAPGYDQGYRQSGGYGPSPGGGQPGYGGYGEY 
GRGPARHEEGSYVPSGPPGPPEQRPAYPDQGGYDQGYQQGATTYGRQDYGGGADYTRYTESPR 
VPGYAPQGGGYAEPAGRDYDYGQSGAPDYGQPAPGGYSGYGQGGYGSAGTSVTLQLDDGSGRT 
YQLREGSNIIGRGQDAQFRLPDTGVSRRHLEIRWDGQVALLADLNSTNGT7VNNAPVQEWQLADGD 

15 VIRLGHSEIIVRMH 

>Rv0032 bioF2 C-terminal similar to B. subtilis BioF TB.seq 34295:36607 MW:86245 
SEQ ID NO:157 

MPTGLGYDFLRPVEDSGINDLKhfVYFMADLADGQPLGRANLYSVCFDLATTDRKLTPAWRTTIKRWF 
20 PGFI^FRFLECGLLTMVSNPLALRSDTDLERVLPVLAGQMDQLAHDDGSDFLMIRDVDPEHYQRYL 
DILRPLGFRPALGFSRVDTTISWSSVEEALGCLSHKRRLPLKTSLEFRERFGIEVEELDEY/^HAPVLA 
RLWRNVKTEAKDYQREDLNPEFFAACSRHLHGRSRLWLFRYQGTPIAFFLNVWGADENYILLEWGI 
DRDFEHYRKANLYRAALMLSLKDAISRDKRRMEMGITNYFTKLRIPGARVIPTIYFLRHSTDPVHTATL 
ARMMMHNIQRPTLPDDMSEEFCRWEERIRLDQDGLPEHDIFRKIDRQHKYTGLKLGGVYGFYPRFT 
25 GPQRSTVKAAELGEIVLLGTNSYLGLATHPEWEASAEATRRYGTGCSGSPLLNGTLDLHVSLEQEL 
ACFLGKPAAVLCSTGYQSNLAAISALCESGDMIIQDALNHRSLFDAARLSGADFTLYRHNDMDHLARV 
LRRTEGRRRIIWDAVFSMEGTVADLATIAELADRHGCRVYVDESHALGVLGPDGRGASAALGVLAR 
MDWMGTFSKSFASVGGFIAGDRPWDYIRHNGSGHVFSASLPPAAAAATHAALRVSRREPDRRAR 
VLAAAEYMATGLARQGYQAEYHGTAIVPVILGNPTVAhlAGYLRLMRSGVYVNPVAPPAVPEERSGFR 
30 TSYLADHRQSDLDRALHVFAGLAEDLTPQGAAL 

>Rv0050 ponA1 TB.seq 53661:55694 MW:711ig SEQ ID NO:158 

WILLPMVTFTMAYLIVDVPKPGDIRTNQVSTtLASDGSEIAKIVPPEGNRVDVNLSQVPMHVRQAVIAA 
EDRNFYSNPGFSFTGFARAVKNNLFGGDLQGGSTITQQYVKNALVGSAQHGWSGLMRKAKELVIAT 
35 KMSGEWSKDDVLQAYLNIIYFGRGAYGISAASKAYFDKPVEQLTVAEGALLAALIRRPSTLDPAVDPE 
GAHARWNWVLDGMVETKALSPNDRAAQVFPETVPPDLARAENQTKGPNGLIERQVTRELLELFNID 
EQTLNTQGLWTTT1DPQAQRAAEKAVAKYLDGQDPDMRAAWSIDPHNGAVRAYYGGDNANGFDF 

156 



8NSOOCID: <WO 0135317A1 I > 



wo 01/35317 PCT/USOO/31152 

AQAGLQTGSSFKVFALVAALEQG1GLGYQVDSSPL7VDGIKITNVEGEGCGTCNIAEALKMSLNTSYY 
RLMLKLNGGPQAVADAAHQAGIASSFPGVAHTLSEDGKGGPPNNGIVLGQYQTRVIDMASAYATLAA 
SGIYHPPHFVQKWSANGQVLFDASTADrfTGDQRIPKAVADNVTAAMEPIAGYSRGHNLAGGRDSA 
AKTGTTQFGDTTANKDAWMVGYTPSLSTAVWVGTVKGDEPLWASGAAIYGSGLPSDIVVKATMDGA 
5 LXGTSNETFPKPTEVGGYAGVPPPPPPPEVPPSETVIQPTVEIAPGITIPIGPPTTITLAPPPPAPPAAT 
PTPPP 



>Rv0051 -TB.seq 55694:57373 MW:61 210 SEQ ID NO:159 

WGALSQSSNISPLPLAADLRSADNRDCPSRTDVLGAAU^NWGGPVGRHALIGRTRLMTPLRVMFAI 
1 0 ALVFLALGWSTKAACLQSTGTGPGDQRVANWDNQRAYYQLCYSDTVPLYGAELLSQGKFPYKSSWI 
ETDSNGTPQLRYDGQIAVRYMEYPVLTGIYQYLSMAIAICIVTALSKVAPLPWAEVV^ 
U^VVLTTVWATSGU^GRRIWDAALVAASPLVIFQIFTNFDAU^TGLATSGLUKWARRRPV^ 
SAAKLYPU.FLYPLlJ-LGIRAGRLNALARTMAAAAATV\a.LVNLPVMLLFPRGWSEFFRLNTRRGDDM 
DSLYNWKSFTGWRGFDPTLGFWEPPLVLNTWTLLFVLCCAAIAYIALTAPHRPRVAQLTFLTVASFL 
1 5 LVNKVAfVSPQFSLWLVPLAVl^PHRRILLAW^mDALVWVPR^iYYLYGNPSRSLPE 

lAVMVLCGLWWQIYRPGRDLVRTGGPGALPACGGVDDPVGGVFANAADAPPGRLPSWLRPRLGD 
EHARERTPDAGRDRTFSGQHRA 

>Rv01 06 -TB.seq 124372:125565 MW:43701 SEQIDNO:160 

20 MRTPVILVAGQDHTDEVTGALLRRTGTWVEHRFDGHWRRMTATLSRGELITTEDALEFAHGCVSC 
TIRDDLLVLLRRLHRRDNVGRIWHLAPWLEPQPICWAIDHVRVCVGHGYPDGPAALDVRVAAWTC 
VDCVRWLPQSLGEDELPDGRTVAQVTVGQAEFADLLVLTHPEPVAVAVLRRLAPRARITGGVDRVEL 
ALAHLDDNSRRGRTDTPHTPLLAGLPPLAADGEVAIVEFSARRPFHPQRLHAAVDLLLDGWRTRGR 
LWU^NRPDQVMWLESAGGGLRVASAGK\Anj\AMAASEVAYVDLERRLFADLMWVYPFGDRHTAh^ 

25 VLVCGADPTDIVNALNAALLSDDEMASPQRWQSYVDPFGDWHDDPCHEMPDAAGEFSAHRNSGES 
R 

>Rv0125 -TB.seq 151146:152210 MW:34g27 SEQ ID NO:161 

MSNSRRRSURWSWLLSVLAAVGLGLATAPAQAAPPALSQDRFADFPALPLDPSAMVAQVGPQWNl 
30 NTKLGYNNAVGAGTGIVIDPNGWLTNNHVIAGATDINAFSVGSGQTYGVDWGYDRTQDVAVLQLR 
GAGGLPSAAIGGGVAVGEPWAMGNSGGQGGTPRAVPGRWALGQTVQASDSLTGAEETLNGLIQ 
FDAAIQPGDSGGPNA/NGLGQWGMNTAASDNFQLSQGGQGFAIPIGQAMAIAGQIRSGGGSPTVHI 
GPTAFLGLGWDNNGNGARVQRWGSAPAASLGISTGDVITAVDGAPINSATAMADALNGHHPGDVI 
SVTWQTKSGGTRTGNVTLAEGPPA 



35 



>Rv0350 dnaK 70 kD heat shock protein, chromosome replication TB.seq 419833:421707 
MW:66832 SEQ ID NO:162 
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MARAVGIDLGTTNSWSVLEGGDPVWANSEGSRTTPSIVAFARNGEVLVGQPAKNQAVTNVDRTV 
RSVKRHMGSDWSIEIDGKKYTAPEISARILMKLKRDAEAYLGEDITDAVITTPAYFNDAQRQATKDAG 
QIAGLNVLRIVNEPTAAALAYGLDKGEKEQRILVFDLGGGTFDVSLLEIGEGWEVRATSGDNHLGGD 
DWDQRWDWLVDKFKGTSGIDLTKDKMAMQRLREAAEKAKIELSSSQSTSINLPYI7>mADKNPLF^ 
EQLTRAEFQRITQDLLDRTRKPFQSVIADTGISVSEIDHWLVGGSTRMPAVTDLVKELTGGKEPNKG 
VNPDEWAVGAALQAGVLKGEVKDVLLLDWPLSLGIETKGGVMTRLIERr^lPTKRSETFTT^ 
QPSVQIQVYQGEREIAAHNKLLGSFELTGIPPAPRGIPQIEVTFDIDANGIVHVTAKDKGTGKENTIRIQ 
EGSGLSKEDIDRMIKDAEAHAEEDRKRREEADVRNQAETLVYQTEKFVKEQREAEGGSKVPEDTLN 
KVDAAVAEAKAALGGSDISAIKSAMEKLGQESQALGQAIYEAAQAASQATGAAHPGGEPGGAHPGS 
ADDWDAEWDDGREAK 

>Rv0351 grpE stimulates DnaKATPase activity TB.seq 421707:422411 MW:24501 
SEQ ID NO:163 

VTDGNQKPDGNSGEQVTVTDKRRIDPETGEVRHVPPGDMPGGTAAADAAHTEDKVAELTADLQRV 
1 S QAOFANYRKRALRDQQAAAORAKASWSQLLGVLDDLERARKHGDLESGPLKSVADKLDSALTGLG 
LVAFGAEGEDFDPVLHEAVQHEGDGGQGSKPVIGTS/MRQGYQLGEQVLRHALVGWDTWVDAAE 
LESVDDGTAVADTAENDQADQGNSADTSGEQAESEPSGS 



5 



10 



>Rv0352 dnaJ acts with GrpE to stimulate DnaKATPase JB.seq 422450:423634 MW:41346 
20 SEQ ID NO:164 

MAQREWVEKDFYQELGVSSDASPEEIKRAYRKLARDLHPDANPGNPAAGERFKAVSEAHNVLSDPA 
KRKEYDETRRLFAGGGFGGRRFDSGFGGGFGGFGVGGDGAEFNLNDLFDAASRTGGTTIGDLFGG 
LFGRGGSARPSRPRRGNDLETETELDFVEAAKGVAMPLRLTSPAPCTNCHGSGARPGTSPKVCPTC 
NGSGVINRNQGAFGFSEPCTDCRGSGSIIEHPCEECKGTGVTTRTRTINVRIPPGVEDGQRIRLAGQ 
25 GEAGLRGAPSGDLYVTVHVRPDKIFGRDGDDLTVTVPVSFTEiJikLGSTLSVPTLDGWGVRVPKGTA 
DGRILRVRGRGVPKRSGGSGDLLVTVKVAVPPNLAGAAQEALEAYAAAERSSGFNPRAGWAGNR 
>Rv0363c fba fructose bisphosphate aldolase TB.seq 441 266:442297 MW:36545 
SEQIDNO:165 

MPIATPEVYAEMLGQAKQNSYAFPAINCTSSETVNAAIKGFADAGSDGIIQFSTGGAEFGSGLGVKDM 
30 VTGAVALAEFTHVIAAKYPVNVALHTDHCPKDKLDSYVRPLLAISAQRVSKGGNPLFQSHMWDGSAV 
PIDENLAIAQELLKAAAAAKIILEIEIGWGGEEDGVANEINEKLYTSPEDFEKTIEALGAGEHGKYLLAA 
TFGNVHGVYKPGNVKLRPDILAQGQQVAAAKLGLPADAKPFDFVFHGGSGSLXSEIEEALRYGWKM 
NVDTDTQYAFTRPlAGHMFTNYDGVLXVDGEVGVKKVnrDPRSYLKKAEASMSQRWQACNDLHCA 
GKSLTH 

35 >Rv0405 pks6TB.seq 485729:489934 MW: 14761 5 SEQIDNO:166 

MTDGSVTADKLQKWFREYLSTHIECHPNEVSLDVPIRDLGLKSIDVLAIPGDLGDRFGFCIPDLAVVVD 
NPSANDLIDSLLNQRSADSLRESHGHADRNTQGRGSINEPVAVIGVGCRFPGDIDGPERLWDFLTEK 
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KCAITAYPDRGFTNAGTFAESGGFLKDVAGFDNRFFDIPPDEALRMDPQQRLLLEVSWEALEHAGIIP 
ESLRLSRTGVFVGVSSTDYVRLVSASAQQKSTIWDNTGGSSSIIANRISYFLDIQGPSIVIDTACSSSLV 
AVHLACRSLSTWDCDIALVGGTNVLISPEPWGGFREAGILSQTGCCHAFDKSADGMVRGEGCGVIVL 
QRLSDARLEGRRILAILTGSAVNQDGKSNGIMAPNPSAQIGVLENACKSARVDPLEIGYVEAHGTGTS 
5 LGDRIEAHALGMVFGRKRPGSGPLMIGSIKPNIGHLEGAAGIAGLIKAVLMVERGSLLPSGGFTEPNP 
AlPFTELGLRWDELQEWPWAGRPRRAGVSSFGFGGTNAHVIVEEAGSVGADTVSGRADVGGSGG 
GVVAVVVISGKTASALAACaAGRLGRYVRARPALDWDVGYSLVSTRSVFDHRAVWGQTRDELLAGL 
AGWAGRPEAGWCGVGKPAGKTAFVFAGQGSQWLGMGSELYAAYPVFAEALDAWDELDRHLRY 
PLRDVIWGHDQDLLNTTEFAQPALFAV^ALYRLLMSWGVRPGLVLGHSVGELAAAHVAGALCLPD 

1 0 AAMLVAARGRLMQALPAGGAMFAVQAREDEVAPMLGHDVSIAAVNGPASWISGAHDAVSAIADRL 
RGQGRRVHRLAVSHAFHSALMEPMIAEFTAVAAELSVGLPTIPVISNVTGQLVADDFASADYWARHIR 
AWRFGDSVRSAHCAGASRFIEVGPGGGLTSLIEASLADAQIVSVPTLRKDRPEPVSVMTAAAQGFV 
SGMGLDWASVFSGYRPKRVELPTYAFQHQKF\Aaj\PAPSVSDPTAAGQIGASDGGAELLASSGFAA 
RLAGRSADEQUVAAIEWCEHAAAVLGRDGAAGLDAGQAFADSGFNSLSAVELRNRLTAVTAVTLPA 

15 TAIFDHPTPTELAQYLITQIDGHGSSAAAAANPAERIDALTDLFLQACDAGRDADGWKMVALASNTRE 
RMSSPVRNhWSKrWALI^GISDVAA/ICIPTLTVLSDQREYRDIANAMTGRHSWSLTLPGFDSSD/^ 
PQNADMIVETVSNAIIDWGGSCRFVLSGYSSGGVLAYALCSHLSVKHQRNPLGVALIDTYLPSQIAN 
PSMN EGFSPNDTGKGLSREVI RVARMLNRLTATRLTAAATYAAI FCaAWEPGRSMAPVLNIVAKDRIAT 
VENLREERINRWRTAAAEAAYSVAEVPGDHFGMMSTSSEAIATEIHDWISGLVRGPHR 

20 

>Rv0435c - ATPase of AAA-family TB.seq 522348:524531 MW:75315 SEQ ID NO:167 
NmHPDPARQLTLTARLNTSAVDSRRGWRLHPNAIAALGIREVVDAVSLTGSRTTAAVAGLAAADTAV 
GTSn.LDDVTLSNAGLREGTEVlVSPVTS/YGARS\m.SGSTlJVTQSVPPVTLRQALLGKVM 
LPRDLGPGTSTSAASRAU^AAVGlSmSELLTWGVDPDGPVSVQPNSLVTWGAGVPAAMGTSTAG 

25 QVSISSPEIQIEELKGAQPQAAKLTEWLKLALDEPHLLQTLGAGTNLGVLVSGPAGVGKATLVRAVCD 
GRRLVTLDGPEIGALAAGDRVKAVASAVQAVRHEGGVLLITDADALLPAAAEPVASLILSELRTAVATA 
GWLIATSARPDQLDARLRSPELCDRELGLPLPDAATRKSLLEALLNPVPTGDLNLDEIASRTPGFWA 
DUW.VREAALRAASRASADGRPPMLHQDDLLGALTVIRPLSRSASDEVTVGDVTLDDVGDMAAAK 
QALTEAVLWPLQHPDTFARLGVEPPRGVLLYGPPGCGKTFVA^RALASTGQLSVHAVKGSELMDKWV 

30 GSSEKAVRELFRRARDSAPSLVFLDELDALAPRRGQSFDSGVSDRWAALLTELDGIDPLRDWMLG 
ATNRPDLIDPALLRPGRLERLVFVEPPDAAARREILRTAGKSIPLSSDVDLDEVAAGLDGYSAADCVAL 
LREAALTAMRRSIDAANVTAADLATARETVRASLDPLQVASLRKFGTKGDLRS 

>Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidyltransferase TB.seq 524531 :525388 
35 MW:31219 SEQIDNO:168 

MIGKPRGRRGVNLQILPSAMTVLSICAGLTAIKFALEHQPKAAMALIAAAAILOGLDGRVARILDAQSR 
MGAEIDSLADAVNFGWPALVLYVSMLSKWPVGWWVLLYAVCWLRLARYNALQDDGTQPAYAHE 
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FFVGMPAPAGAVSMIGLLALKMQFGEGWmSGVVFLSRWTGTSILLVSGIPMKKMHAVSVPPNYAA 

AUJ^VUMCAAAAVl^YLLIVVVIIIAYMCHIPFAVRSQRWLAQHPEVWDDKPKQRRA^^ 

YRPSMARLGLRKPGRRL 

5 >RvO440 groEL 260 kD chaperonin 2 TB.seq 528606:530225 MW:56728 SEQ ID NO:169 

MAKTIAYDEEARRGLERGLNAIJ^AVKVTLGPKGRrWVLEKKWGAPTITNDGVSIAKEIELEDPYEKI 
GAELVKEVAKKTDDVAGDGTrrAT\njVQAL\mEGLRrWAAGANPLGLKRGIEKAVEK\n^TLLKG/^ 
EVETKEQIAATAAISAGDQSIGDLIAEAMDKVGNEGVITVEESNTFGLQLELTEGMRFDKGYISGYFVT 
DPERQEAVLEDPYIL1.VSSKVSTVKDLLPLLEKV1GAGKPLLIIAEDVEGEALSTLWNKIRGTFKSVAVK 
10 APGFGDRRKAMLQDMAILTGGQVISEEVGLTLENADLSLLGKARKWVTKDETTIVEGAGDTDAIAGR 
VAQIRQEIENSDSDYDREKLQERLAKLAGGVAVIKAGAATEVELKERKHRIEDAVRNAKAAVEEGIVA 
GGGVTLLQAAPTLDELKLEGDEATGANIVKVALEAPLKQIAFNSGLEPGWAEKVRNLPAGHGLNAQT 
GVYEDLLAAGVADPVKVTRSALQNAASIAGLFLTTEAWADKPEKEKASVPGGGDMGGMDF 

15 >Rv0482 murBTB.seq 570537:571643 MW:38522 SEQ ID NO: 170 

MKRSGVGSLFAGAHIAEAVPLAPLTTLRVGPIARRVITCTSAEQWAALRHLDSAAKTGADRPLVFAG 
GSNLVIAENLTDLTWRLANSGITIDGNLVRAEAGAVFDDWVRAIEQGLGGLECLSGIPGSAGATPVQ 
NVGAYGAEVSDTITRVRLLDRCTGEVRVWSARDLRFGYRTSVLKHADGI-AVPTWLEVEFALDPSGR 
SAPLRYGELIAALNATSGERADPQAVREAVLALRARKGMVLDPTDHDTWSVGSFFTNPWTQDVYE 

20 RLAGDAATRKDGPVPHYPAPDGVKLAAGWLVERAGFGKGYPDAGAAPCRLSTKHALALTNRGGAT 
AEDWTLARAVRDGVHDVFGITLKPEPN^IGCML 

>Rv0483 - TB.seq 571708:573060 MW:47859 SEQ ID NO:171 

WIRVLFRPVSLIPVNNSSTPQSQGPISRRL^TALGFGVLAPNVLVACAGKVTKLAEKRPPPAPRLTF 
25 RPADSAADWPIAPISVEVGDGVWQRVALTNSAGKWAGAYSRDRTIYTITEPLGYDTTYTVVSGSAV 
GHDGKAVPVAGKFrTVAPVKTINAGFQLADGQTVGIAAPVIIQFDSPISDKAAVERALTVTTDPPVEGG 
WAWLPDEAQGARVHWRPREYYPAGTTVDVDAKLYGLPFGDGAYGAQDMSLHFQIGRRQNA/KAEV 
SSHRIQWTDAGX^MDFPCSYGEADLARrm-RNGIHWTEKYSDFYMSNPAAGYSHIHERWAVRISN 
NGEFIHANPMSAGAQGNSNVTNGCINLSTENAEQYYRSAVYGDPVEVTGSSIQLSYADGDIWDWAV 
30 DWDTWVSMSALPPPAAKPAATQIPVTAPVTPSDAPTPSGTPTTTNGPGG 

>Rv0489 gpm phosphogtycerate mutase I TB.seq 578424:579170 MW:27217 SEQ ID NO:172 
MANTGSLVLLRHGESDWNALNLFTGWVDVGLTDKGQAEAVRSGELIAEHDLLPDVLYTSLLRRAITT 
AHLALDSADRLWIPVRRSWRLNERHYGALQGLDKAETKARYGEEQFMAWRRSYDTPPPPIERGSQ 
35 FSQDADPRYADIGGGPLTECLADWARFLPYFTDVIVGDLRVGKTVLIVAHGNSLRALVKHLDQMSDD 
ElVGLNIPTGIPLRYDLDSAMRPLVRGGTYLDPEAAAAGAAAVAGQGRG 
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>Rv0490 senX 3sensor histicfine kinase TB.seq 579347:580576 MW:447g4 SEQ ID NO: 173 

\nVFSALLLAGVLSALALAVGGAVGMRLTSRVVEQRQRVATEWSGIWSQMLQCI\m.MPLGAAWD 

THRDWYLNERAKELGLVRDRQLDDQAWRAARQALGGEDVEFDLSPRKRSATGRSGLSVHGHARL 

LSEEDRRFAWFVHDQSDYARMEAARRDFVANVSHELKTPVGAMALLAEALLASADDSETVRRFAE 

KVLIEANRLGDMVAELIELSRLQGAERLPNMTDVDVDTIVSEAISRHKVAADNADIEVRTDAPSNLRVL 

GDQTLLVTALANLVSNAIAYSPRGSLVSISRRRRGANIEIAVTDRGIGIAPEDQERVFERFFRGDKARS 

RATGGSGLGLAIVKHVAANHDGTIRVWSKPGTGSTFTLALPALIEAYHDDERPEQAREPELRSNRSQ 
REEELSR 

>Rv0500 proC pyrroline-5-cail30xylate reductase TB.seq 590081:590965 MW:301 72 
SEQ ID NO:174 

MLFGMARIAIIGGGSIGEALLSGLLRAGRQVKDLWAERMPDRANYLAQTYSVLVTSAADAVENATFV 

WAVKPADVEPVIADLANATAAAENDSAEQVFVTWAGITIAYFESKLPAGTPWRAMPNAAALVGAG 

VTALAKGRFVTPQQLEEVSALFDAVGGVLTVPESQLDAVTAVSGSGPAYFFLLVEALVDAGVGVGLS 

RQVATDLAAQTMAGSAAMLLERMEQDQGGANGELMGLRVDLTASRLRAAVTSPGGTTAAALRELE 

RGGFRMAVDAAVQAAKSRSEQLRITPE 

>Rv0528 - TB.seq 618303:619889 MW:57132 SEQ ID NO: 175 

MWRSLTSMGTAL\A.LFLLALAA1PGALLPQRGLNAAKVDDYLAAHPLIGPWLDELQAFDVFSSFWFTA 

lYVLLFVSLVGCLAPRTIEHARSLRATPVAAPRNLARLPKHAHARLAGEPAALAATITGRLRGWRSITR 

QQGDSVEVSAEKGYLREFGNLVFHFALLGLLVAVAVGKLFGYEGNVMADGGPGFCSASPAAFDSF 

RAGNTV/DGTSLHPICVRVNNFQAHYLPSGQATSFAADIDYQADPATADLIANSWRPYRLQVNHPLRV 

GGDRVYLQGHGYAPTFTVTFPDGQTRTSTVQWRPDNPQTLLSAGWRIDPPAGSYPNPDERRKHQI 

AIQGLLAPTEQLDGTLLSSRFPALNAPAVAIDIYRGDTGLDSGRPQSLFTLDHRLIEQGRLVKEKRVNL 

RAGQQVRIDQGPAAGTWRFDGAVPFVNLQVSHDPGQSVVVLVFAITMMAGLLVSLLVRRRRVWARI 

TPTTAGTVNVELGGLTRTDNSGWGAEFERLTGRLLAGFEARSPDMAEAAAGTGRDVD 

>Rv0667 rpoB [beta] subunit of RNA polymerase TB.seq 759805:763320 MW:129220 
SEQ ID NO:176 

LADSRQSKTAASPSPSRPQSSSNNSVPGAPNRVSFAKLREPLEVPGLLDVQTDSFEWLIGSPRWRE 

SAAERGDVNPVGGLEEVLYELSPIEDFSGSMSLSFSDPRFDDVKAPVDECKDKDMTYAAPLFVTAEF 

INNNTGEIKSQTVFMGDFPMMTEKGTFIINGTERVWSQLVRSPGVYFDETIDKSTDKTLHSVKVIPSR 

GAWLEFDVDKRDTVGVRIDRKRRQPVTVLLKALGWTSEQIVERFGFSEIMRSTLEKDNTVGTDEALL 

DIYRKLRPGEPPTKESAQTLLENLFFKEKRYDLARVGRYKVNKKLGLHVGEPITSSTLTEEDWATIEY 

LVRLHEGQTTMTVPGGVEVPVETDDIDHFGNRRLRTVGELIQNQIRVGMSRMERWRERMTTQDVE 

AITPQTLINIRPWAAIKEFFGTSQLSQFMDQNNPLSGLTHKRRLSALGPGGLSRERAGLEVRDVHPS 

HYGRMCPIETPEGPNIGLIGSLSVYARVNPFGFIETPYRKWDGWSDEIVYLTADEEDRHWAQANS 
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PIDADGRF=VEPRVLVRRKAGEVEYVPSSEVDYMDVSPRQMVSVATAMIPFLEHDDANRALMGANMQ 
RQAVPLVRSEAPLVGTGMELRAAIDAGDNAA/AEESGVIEEVSADYITVMHDNGTRRPmMRKFARSN 
HGTCANQCPIVDAGDRVEAGQVIADGPCTDDGEMALGKNLLVAIMPWEGHNYEDAIILSNRLVEEDV 
LTSIHIEEHEIDARDTKLGAEEITRDIPNISDEVLADLDERGIVRIGAEVRDGDILVGKVTPKGETELTPE 
5 ERLLRAIFGEKAREVRDTSLKVPHGESGKVIGIRVFSREDEDELPAGVNELVRVYVAQKRKISDGDKL 
AGRHGNKGVIGKILPVEDMPFLADGTPVDIILNTHGVPRRMNIGQILETHLGWCAHSGWKVDAAKGV 
PDWAARLPDELLEAQPNAIVSTPVFDGAQEAELQGLLSCTLPNRDGDVLVDADGKAMLFDGRSGEP 
FPYP\m/GY^^IMKLHHL^^DKIHARSTGPYSMITQQPLGGKAQFGGQRFGEMECWAMQAYGAAY 
TLQELLTIKSDDWGRVKWEAIVKGENIPEPGIPESFKVLLXELQSLCLNVEVLSSDGAAIELREGEDE 
10 DLERAAANLGINLSRNESASVEDLA 

>Rv0668 rpoC [beta]' subunit of RNA polymerase TB.seq 763368:76731 5 MW:146740 
SEQ ID NO: 177 

VLDVNFFDELRIGUkTAEDIRQWSYGEVKKPETINYRTLKPEKDGLFCEKIFGPTRDWECYCGKYKRV 

1 5 RFKGI ICERCGVENmRAKVRRERMGHIELAAPVTHIVVYFKGVPSRLGYLLDLAPKDLEKIIYFAAYVITS 
VDEEMRHNELSTLEAEMAVERKAVEDQRDGELEARAQKLEADLAELEAEGAKAOARRKVRDGGER 
EMRQIRDRAQRELDRLEDIWSTFTKLAPKQLIVDENLYRELVDRYGEYFTGAMGAESIQKLIENFDIDA 
EAESLRDVI RNGKGQKKLRALKRLKWAAFQQSGNSPMGMVLDAVPVIPPELRPMVQLDGGRFATS 
DLNDLYRRVINRNNRLKRLIDLGAPEIIVNNEKRMLQESVDALFDNGRRGRPVTGPGNRPLKSLSDLL 

20 KGKQGRFRQNLLGKRVDYSGRSVIWGPQLKLHQCGLPKLMALELFKPFVMKRLVDLNHAQNIKSAK 
RMVERQRPQVWDVLEEVIAEHPVLLNRAPTLHRLGIQAFEPMLVEGKAIQLHPLVCEAFNADFDGDQ 
MAVHLPLSAEAQAEARILMLSSNNILSPASGRPLAMPRLDMWGLYYLTTEVPGDTGEYQPASGDHP 
ETGVYSSPAEAIMAADRGVLSVRAKIKVRLTQLRPPVEIEAELFGHSGWQPGDAWMAETTLGRVMF 
NELLPLGYPFVNKQMHKKVQAAIINDIJ^ERYPMIWAQTVDKLKDAGFYWATRSGVWSMADVLVPP 

25 RKKEILDHYEERADKVEKQFQRGALNHDERNEALVEIWKEATDEVGQALREHYPDDNPIITIVDSGAT 
GNFTQTRTLAGMKGLVTNPKGEFIPRPVKSSFREGLTVLEYFIhlTHGARKGLADTALRTADSGYLTRR 
LVDVSQDVIVREHDCQTERGIWELAERAPDGTLIRDPYIETSAYARTLGTDAVDEAGNVIVERGQDL 
GDPEI DALLAAGITQVKVRSVLTCATSTGVCATCYGRSMATGKLVDIGEAVGIVAAQSIGEPGTQLTM 
RTFHQGGVGEDITGGLPRVQELFEARVPRGKAPIADVTGRVRLEDGERFYKITIVPDDGGEEWYDKI 

30 SKRQRLRVFKHEDGSERVLSDGDHVEVGQQLMEGSADPHEVLRVQGPREVQIHLVREVQEVYRAQ 
GVSIHDKHIEVIVRQMLRRVTIIDSGSTEFLPGSLIDRAEFEAENRRWAEGGEPAAGRPVLMGITKAS 
LATDSWLSAASFQETTRVLTDAAI NCRSDKLNGLKENVIIGKLI PAGTGI NRYRNI AVQPTEEARAAAYT 
IPSYEDQYYSPDFGAATGAAVPLDDYGYSDYR 

35 >Rv0711 atsATB.seq 806333:808693 MW:8621 6 SEQ 10 NO:178 

iyW»EATEAFNGTIELDIRDSEPDWGPYAAPVAPEHSPNILYLVWDDVGIATWDCFGGLVEMPAMTRV 
AERGVRLSQFHTTALCSPTRASLLTGRNATTVGMATI EEFTDGFPNCNGRI PADTALLPEVLAEHGYN 
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TYCVGKWHLTPLEESNMASTKRHWPTSRGFERFYdFLGGETDQWYPDLVYDNHPVSPPGTPEGG 
YHLSKDIADKTIEFIRDAKVIAPDKPWFSYVCPGAGHAPHHVFKEWADRYAGRFDMGYERYREIVLE 
RQKALGIVPPDTELSPINPYLDVPGPNGETVVPLQDTVRPWDSLSDEEKKLFCRMAEVFAGFLSYTDA 
QIGRILDYLEESGQLDNTIIWISDNGASGEGGPNGSVNEGKFFNGYIDTVAESMKLFDHLGGPQTYN 
5 HYPIGWAVl^FNTPYKLFKRYASHEGGIADPAIISWPNGIAAHGEIRDNYVrWSDITPT\m3LLGiy^ 

GTVKGIPQKPMDGVSFIAALADPAADTGKTTQFYTMLGTRGIWHEGWFANTIHAATPAGWSNFNAD 
RWELFHIAADRSQCHDUVAEHPDKLEELKALWFSEAAKYNGLPLADLNLLETMTRSRPYLVSERASY 
Vrrv'PDCADVGIGAAVEIRGRSFAVLADVTIDTTGAEGVLFKHGGAHGGHVLFVRDGRLHYVYNFLGE 
RQQLVSSSGPVPSGRHLLGVRYLRTGTVPNSHTPVGDLELFFDENLVGALTNVLTHPGTFGLAGAAI 
1 0 SVGRNGGSAVSSHYEAPFAFTGGTITQVTVDVSGRPFEDVESDLALAFSRD 

>Rv0764c • lanosterol 14-demethylase cytochrome P450 TB.seq 856683:858035 MW:50879 
SEQ ID NO: 179 

MSAVALPRVSGGHDEHGHLEEFRTDPIGLMQRVRDECGDVGTFQLAGKQWLLSGSHANEFFFRA 
15 GDDDLDQAKAYPFMTPIFGEGWFDASPERRKEMLHNAALRGEQMKGHAATIEDQVRRMIADWGE 
AGEIDLLDFFAELTIYTSSACLIGKKFRDQLDGRFAKLYHELERGTDPLAYVDPYLPIESFRRRDEARN 
GLVALVADIMNGRIANPPTDKSDRDMLDVLIAVKAETGTPRFSADEITGMFISMMFAGHHTSSGTASW 
TLIELMRHRDAYAAVIDELDELYGDGRSVSFHALRQIPQLENVLKETLRLHPPLIILMRVAKGEFEVQG 
HRIHEGDLVAASPAISNRIPEDFPDPHDFVPARYEQPRQEDLLNRWTWIPFGAGRHRCVGAAFAIMQI 
20 KAIFSVLLREYEFEMAQPPESYRNDHSKMWQLAQPACVRYRRRTGV 

>Rv0861c - DMA helicase TB.seq 958524:960149 MW:59773 SEQ ID NO:1 80 

VQSDKTVLLEVDHELAGAARAAIAPFAELERAPEHVHTYRITPLALWNARAAGHDAEQWDALVSYS 

RYAVPQPLL\«DIVDTMARYGRLOLVKNPAHGLTLVSLDRA\l.EEVLRNKKIAPMLGARIDDDTWVHP 

25 SERGRVKQLLLKIGWPAEDLAGYVDGEAHPISLHQEGWQLRDYQRLAADSFWAGGSGWVLPCGA 
GKTLVGAAAMAKAGATTLILVTNIVAARQWKRELVARTSLTENEIGEFSGERKEIRPVTISTYQMITRR 
TKGEYRHLELFDSRDWGLIIYDEVHLLPAPVFRMTADLQSKRRLGLTATLIREDGREGDVFSLIGPKR 
YDAPWKDIEAQGWIAPAECVE\m\n-MTDSERMMYATAEPEERYRICSTV/HTKiAWKSILAKHPDEQ 
TLVIGAYLDQLDELGAELGAPVIQGSTRTSEREALFDAFRRGEVATLWSKVANFSIDLPEAAVAVQVS 

30 GTFGSRQEEAQRLGRILRPKADGGGAIFYSWARDSLDAEYAAHRQRFLAEQGYGYIIRDADDLLGP 
Al 

>Rv0904c accD3 TB.seq 1006694:1008178 MW:51741 SEQ ID NO:181 

VSRITTDQLRHAVLDRGSFVSWDSEPLAVPVADSYARELAAARAATGADESVQTGEGRVFGRRVAV 
35 VACEFDFLGGSIGVAAAERITAAVERATAERLPLLASPSSGGTRMQEGTVAFLQMVKIAAAIQLHNQA 
RLPYLVYLRHPTTGGVFASWGSLGHLTVAEPGALIGFLGPRVYELLYGDPFPSGVQTAENLRRHGIID 
GWALDRLRPMLDRALTVLIDAPEPLPAPQTPAPVPDVPTWDSWASRRPDRPGVRQLLRHGATDR 
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VLLSGTDQGEAATTLLALARFGGQPTVVLGQQRAVGGGGSTVGPAALREARRGMALAAELCLPLVL 
VIDAAGPALSAAAEQGGU^GQIAHCU^LNn^DTPWSILLGQGSGGPALAMLPADRVLAALHGWLAP 
LPPEGASAIVFRDTAHAAELAAAQGIRSADLLKSGIVDTIVPEYPDAADEPIEFALRLSNAIAAEVHALR 
KIPAPERLATRLQRYRRIGLPRD 

5 

>Rv0g83 - TB.seq 1099064:1 100455 MW:46454 SEQ ID NO:182 

MAKLARWGLVQEEQPSDMTNHPRYSPPPQQPGTPGYAQGQQQTYSQQFDWRYPPSPPPQPTQY 
RQPYEALGGTRPGLIPGVIPTMTPPPGMVRQRPRAGMLAIGAVTIAWSAGIGGAAASLVGFNRAPA 
GPSGGPVAASAAPSIPAANMPPGSVEQVAAKWPSWMLETDLGRQSEEGSGIILSAEGLILTNNHVI 
10 AAAAKPPLGSPPPKTTVTFSDGRTAPFTWGADPTSDIAWRVQGVSGLTPISLGSSSDLRV6QPVLA 
IGSPLGLEGTVTTGIVSALNRPVSTTGEAGNQNTVLDAIQTDAAINPGNSGGALVNMNAQLVGVNSAI 
ATLGADSADAQSGSIGLGFAIPVDQAKRIADELISTGKASHASLGVQVTNDKDTLGAKIVEWAGGAA 
ANAGVPKGWVTKVDDRPINSADALVAAVRSKAPGATVALTFQDPSGGSRTVQVTLGKAEQ 

15 >Rv1008 - Similar to E.coli protein YcfH TB.seq 1 127087:1 127878 IV)W:29066 SEQ ID NO:183 

LVDAimiLDACGARDADTVRSLVERAAAAGVTAVVWADDLESARVVV^RAAEWDRRVYAAVALHPT 
RADALTDAARAELERLVAHPRWAVGETGIDMYWPGRLDGCAEPhlVQREAFAWHIDLAKRTGKPLM 
IHNRQADRDVLDVLRAEGAPDTViLHCFSSDAAIVIARTCVDAGWLLSLSGTVSFRTARELREAVPLMP 
VEQLLV^DAPYLTPHPHRGLANEPYCLPYTVRALAELVNRRPEEVALITTSNARRAYGLGVVMRQ 

20 

>Rv1 009 • lipoprotein, similar to various other MTB proteins TB.seq 1 1 28089:1 1 291 74 MW:38079 
SEQ ID NO:184 

MLRLWGALLLVU^AGGYAVAACKTVTLTVDGTAMRVTTMKSRVIDIVEENGFSVDDRDDLYPAAG 
VQVHDADTIVLRRSRPLQISLDGHDAKQNWTTASTVDEALAQLAMTDTAPAAASRASRVPLSGMALP 
25 WSAKTVQLNDGGLVRTVHLPAPNVAGLLSAAGVPLLQSDHWPAATAPiVEGIVIQIQVTRNRIKKVTE 
RLPLPPNARRVEDPEMNiy/ISREVVEDPGVPGTQDVTFAVAEVNGVETGRLPVAN\A/VTPAHEAWR 
VGTKPGTEVPPVIDGSIWDAIAGCEAGGNWAINTGNGYYGGVQFDQGTWEANGGLRYAPRADLAT 
REEQIAVAEVTRLRQGWGAWPVCAARAGAR 

30 >Rv1010 ksgA 16S rRNA dimethyitransferase TB.seq 1 129150:1 130100 MW:34647 
SEQ ID NO:185 

IVICCTSGCALTIRLLGRTEIRRI^ELDFRPRKSLGQNFVHDANTVmRWAASGVSRSDLVLEVGPGL 
GSLTLALLDRGATWAVEIDPLIJVSRLQQTVAEHSHSEVHRLTWNRDVLALRREDLAAAPTAWANL 
PYNVAVPALLHLLVEFPSIRWTVMVQAEVAERLAAEPGSKEYGVPSVlaRFFGR\^RCG^WSPT^ 
35 WPiPRVYSGLVRIDRYETSPWPTDDAFRRRVFELVDIAFAQRRKTSRNAFVQWAGSGSESANRLLAA 
SIDPARRGETLSIDDFVRLLRRSGGSDEATSTGRDARAPDISGHASAS 
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>Rv1011 - Similar to E.coli protein YcbH TB.seq 1 130189:1 131106 MW:31350 

SEQ ID NO: 186 

VPTGSVm^RVPGKVNLYlJ^VGDRREDGYHELTTVFHAVSLVDEVTVRNADVLSLELVGEGADQLPTD 
ERNLAWQAAELMAEHVGRAPDVSIMIDKSIPVAGGMAGGSADAAAVLVAMNSLVVELNVPRRDLRML 
5 AARLGSDV^FALHGGTALGTGRGEELATWSRNTFHWVLAFADSGLLTSAVYNELDRLREVGDPPRL 
GEPGPVLAALAAGDPDQLAPLLGNEMQAAAVSLDPALARALRAGVEAGALAGIVSGSGPTCAFLCTS 
ASSAI DVGAQLSG AGVCRTVRVATGPVPGARWSAPTEV 

>Rv1106c - cholesterol dehydrogenase TB.seq 1232845:1233954 ^/IW:40743 SEQ ID NO:187 
1 0 MLRRMGDASLTTELGRVLVTGGAGFVGANLVTTLLDRGHVVVRSFDRAPSLLPAHPQLEVLQGDITD 
ADVCAAAVDGIDTIFHTAAIIELMGGASVTDEYRQRSFAVNVGGTENLLHAGQRAGVQRFVYTSSNS 
WMGGQNIAGGDETLPYTDRFNDLYTETKWAERFVLAQNGVDGMLTCAIRPSGIWGNGDQTMFRK 
LFESVLXGHVKVLVGRKSARLDNSYVHNLIHGFILAAAHLVPDGTAPGQAYFINDAEPINMFEFARPVL 
EACGQRWPKMRISGPAVRWVMTGWQRLHFRFGFPAPLLEPLAVERLYLDNYFSIAKARRDLGYEPL 
15 FTTQQALTECLPYYVSLFEQMKNEARAEKTAATVKP 

>Rv1110 lytB2 TB.seq 1236183:1237187 MW:36298 SEQ ID NO:188 

MVPTVDMGIPGASVSSRSVADRPNRKRVLLAEPRGYCAGVDRAVETVERALQKHGPPVYVRHEIVH 
NRHWDTLAKAGAVFVEETEQVPEGAIWFSAHGVAPTVHVSASERNLQVIDATCPLVTKVHNEARR 
20 FARDDYDILLIGHEGHEEWGTAGEAPDHVQLVDGVDAVDQVrVRDEDKWWLSQTTLSVDETMEIV 
GRLRRRFPKLQDPPSDDICYATQNRQVAVKAMAPECELVIWGSRNSSNSVRLVEVALGAGARAAH 
LVDWADDIDSAWLDGVrrVGVTSGASVPEVLVRGVLERLAECGYDIVQPVTTANETLVFALPRELRS 
PR 

25 >Rv1216c - TB.seq 1359473:1360144 MW:24863 SEQ ID NO:i89 

MHIGLKIFIWGVLGLWFGALLFGPAGTFDYWQAWVFLAAFVSTTIGPTIYLARNDPAALQRRMRSGP 
UVEGRTIQKFIVIGAFLGFFAMMVLSACDHRYGWSSVPAAVCVIGDVLVMTGLGIAMLWIQNRYAAS 
TVRVEAGQILASDGLYKIVRHPMYAGNNA/MMTGIPLALGSYWAMFILVPGTLVLVFRILDEEKLLTQEL 
SGYREYRQLVRYRLVPYWV 

30 

>Rv1223 htrA TB.seq 1365810:1367456 MW:56547 SEQ ID NO:ig0 

VSHLSQRMAGLLRVHGEWSRSVDTRVDTDNAMPARFSAQIQNEDEVTSDQGNNGGPNGGGRLAP 
RPVFRPPVDPASRQAFGRPSGVQGSFVAERVRPQKYQDQSDFTPNDQLADPVLQEAFGRPFAGAE 
SLQRHPIDAGALAAEKDGAGPDEPDDPWRDPAAAAALGTPALAAPAPHGALAGSGKLGVRDVLFGG 
35 KVSYlJ\LGILVAIALVIGGIGGVIGRKTAE\^AFTTSK\n"LSTTGNAQEPAGRFTKVAAAVADSWTIE 
SVSDQEGMQGSGVIVDGRGYIVTNNHVISEAANNPSQFKTTWFNDGKEVPANLVGRDPKTDLAVLK 
VDNVDNLTVARLGDSSKVRVGDEVLAVGAPLGLRSTVTQGIVSALHRPVPLSGEGSDTDTVIDAIQTD 
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ASINHGNSGGPLIDMDAQV1GINTAGKSLSDSASGLGFAIPVNEMKLVANSLIKDGKIVHPTLGISTRSV 

SNAIASGAQVANVKAGSPAQKGGILENDVIVKVGNRAVADSDEFWAVRQLAIGQDAPIEWREGRH 

VTLTVKPDPDST 

5 >Rv1224 -TB.seq 1367461:1367853 MW:14083 SEQ ID NO:191 

VFANIGWWEMLVLVMVGLNA/LGPERLPGAIRWAASALRQARDYLSGVTSQLREDIGPEFDDLRGHL 
GELQKLRGMTPRAALTKHLLDGDDSLFTGDFDRPTPKKPDAAGSAGPDATEQIGAGPIPFDSDAT 

>Rv1229c mrp similar to MRP/NBP35 ATP-binding proteins TB.seq 1371778:1372947 MW:41064 
10 SEQ ID NO:192 

MPSRLHSAVMSGTRDGDLNAAIRTALGKVIDPELRRPITELGMVKSIDTGPDGSVHVEIYLTIAGCPKK 
SEITERVTRAVADVPGTSAVRVSLDVMSDEQRTELRKQLRGDTREPVIPFAQPDSLTRVYAVASGKG 
GVGKSTVm'NLAAAMAN^GLSIGVLDADIHGHSIPRMMGTTDRPTQVESMILPPIAHQVKVISIAQFTQ 
GNTPWWRGPMLHRALQQFLADVYWGDLDVLLLDLPPGTGDVAISVAQU PN AELLWTTPQLAAAE 
15 VAERAGSIALQTRQRIVGWENMSGLTLPDGTTMQVFGEGGGRLVAERLSRAVGADVPLLGQIPLDP 
ALVAAGDSGVPLVLSSPDSAIGKELHSIADGLSTRRRGLAGMSLGLDPTRR 

>Rv1 239c corA magnesium and cobalt transport protein TB.seq 1 381 943: 1 383040 MW:41 470 
SEQ ID NO: 193 

20 VFPGFDALPEVLRPVARPQPPNAHPVAQPPAQALVDCGVYVCGQRLPGKYTYAAALREVREIELTG 
QEAFVWIGLHEPDENQMQDVADVFGLHPLAVEDAVHAHQRPKLERYDETLFLVLKTVNYVPHESW 
U^EIVKTGEIMIFVGKDFWTVRHGEHGGLSEVRKRMDADPEHLRLGPYAVMHAIADYWDHYLEVT 
NLMETDIDSIEEVAFAPGRKLDIEPiVLLKREWELRRCVNPLSTAFQRMQTESKDLISKEVRRYLRDV 
ADHQTEAADQIASYDDMLNSLVQAALARVGMQQNMDMRKISAWAGIIAVPTMIAGIYGiy/INFHFMPEL 

25 DSRWGYPTVIGGMVLICLFLYHVFRNRNWL 

>Rv1279 -TB.seq 1430060:1431643 MW:57332 SEQ ID NO:194 

MDTQSDYVWGTGSAGAWASRLSTDPATTWALEAGPRDKNRFIGVPAAFSKLFRSEIDWDYLTEP 
QPEUDGREIYWPRGKVLGGSSSMNAMMVVVRGFASDYDEWAARAGPRWSYADVLGYFRRIENVTA 
AWHFVSGDDSGVTGPLHISRQRSPRSVTAAWLAAARECGFAAARPNSPRPEGFCETWTQRRGAR 

30 FSTADAYLKPAMRRKNLRVLTGATATRWl DGDRAVGVEYQSDGQTRIVYARREW^CAGAVNSPQL 
LMLSGIGDRDHLAEHDIDTVYHAPEVGCNLLDHLVWLGFDVEKDSLFAAEKPGQLISYLLRRRGMLT 
SNVGEAYGFVRSRPELKLPDLELIFAPAPFYDEALVPPAGHGWFGPILVAPQSRGQITLRSADPHAK 
PVIEPRYLSDLGGVDRAAMMAGLRIC/^AQARPLRDLLGSIARPRNSTELDEATLELALATCSHTLYH 
PMGTCRMGSDEASWDPQLRVRGVDGLRVADASVMPSTVRGHTHAPSVLIGEKAADLIRS 

35 >Rv1294 thrA homoserine dehydrogenase TB.seq 1449373:1450695 MW:45522 SEQ ID NO:195 
VPGDEKPVGVAVLGLGNVGSEWRIIENSAEDLAARVGAPLVLRGIGVRRVTTDRGVPIELLTDDIEEL 
VAREDVDIWEVIV/IGPVEPSRKAILGALERGKSWTANKALLATSTGELAQAAESAHVDLYFEAAVAGA 
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IPVIRPLTQSLAGDTVLRVAGIVNGTTNYILSAMDSTGADYASALADASALGYAEADPTADVEGYDAA 
AKAAILASIAFHTRWADDWREGITKVTPADFGSAHALGCTIKLLSICERITTDEGSQRVSARVYPALV 
PLSHPU\AVNGAFNAVWEAEAAGRLMFYGQGAGGAPTASAVrGDLVMAARNRVLGSRGPRESKY 
AQLPVAPMGFIETRYYVSMNVADKPGVLSAVAAEFAKREVSIAEVRQEGWDEGGRRVGARIVWTH 
5 LATDAALSETVDALDDLDWQGVSSVIRLEGTGL 

>Rv1323 fadM acetykCoA Oacetyltransferase (aka thiL) TB.seq 1485860:1487026 MW:40049 
SEQ ID NO: 196 

VIVAGARTPIGKLMGSLKDFSASELGAIAIKGALEKANVPASLVEYVIMGQVLTAGAGQMPARQAAVA 
10 AGIGWDVPALTINKMCLSGIDAIALADQURAREFDWVAGGQES^^^<APHLL^NSRSGYKYGDVTVL 
DHMAYDGLHDVFTDQPMGALTEQRNDVDMFTRSEQDEYAAASHQKAAAAWKDGVFADEVIPVNIP 
QRTGDPLQFTEDEGIRANTTAAALAGLXPAFRGDGTITAGSASQISDGAAAWVMNQEKAQELGLTW 
LAEIGAHGWAGPDSTLQSQPANAINKALDREGISVDQLDWEINEAFAAVALASIRELGLNPQIVNVN 
GGAIAVGHPLGMSGTRITLHAALQLARRGSGVGVAALCGAGGQGOALILRAG 

IS 

>Rv1389 gmk putative guanylate kinase TB.seq 1564399:1565022 MW:22064 SEQ ID NO:197 
VSVGEGPDTKPTARGQPAAVGRVWLSGPSAVGKSTWRCLRERIPNLHFSVSATTRAPRPGEVDG 
VDYHFIDPTRFQQLIDQGELLEWAEIHGGLHRSGTLAQPVRAAAATGVPVLIEVDLAGARAIKKTMPE 
A\m/FLAPPSWQDLQARLIGRGTETADV1QRRLDTARIELAAQGDFDKWVNRRLESACAELVSLLVG 
20 TAPGSP 

>Rv1407 finu similarto Fmu protein TB.seq 1583099:1584469 l\4W:48494 SEQ ID NO:198 
MTPRSRGPRRRPLDPARRAAFETLRAVSARDAYANLVLPALLAQRGIGGRDAAFATELTYGTCRAR 
GLLDAVIGAAAERSPQAIDPVLLDLLRLGTYQLLRTRVDAHAAVSTTVEQAGIEFDSARAGFVNGVLR 
25 TIAGRDERSWVGELAPDAQNDPIGHAAFVHAHPRWIAQAFADALGAAVGELEAVLASDDERPAVHLA 
ARPGVLTAGELARAVRGTVGRYSPFAVYLPRGDPGRLAPVRDGQALVQDEGSQLVARALTLAPVDG 
DTGRWLDLCAGPGGKTALLAGLGLQCAARVTAVEPSPHRADLVAQNTRGLPVELLRVDGRHTDLDP 
GFDRVLVDAPCTGLGALRRRPEARWRRQPADVAALAKLQRELLSAAIALTRPGGWLYATCSPHLAE 
TVGAVADALRRHPVHALDTRPLFEPVIAGLGEGPHVQLWPHRHGTDAMFAAALRRLT 

30 

>Rv1409 ribG ritiofiavin biosynthesis TB.seq 1585192:1586208 MW:35367 SEQ ID NO:199 
MNVEQVKSIDEAMGUVIEHSYQVKGTTYPKPPVGAVIVDPNGRiVGAGGTEPAGGDHAEWALRRAG 
GLAAGAiWVTMEPCNHYGKTPPCVNALIEARVGTWYAVADPNGIAGGGAGRLSAAGLQVRSGVLA 
EQVAAGPLREWLHKQRTGLPHVTWKYATSiDGRSAAADGSSQWISSEAARLDLHRRRAIADAILVGT 
35 GTV/IJ^DDPALTARIJ^DGSlJ^PQQPLRWVGKRDIPPEARVLNDEARTMMiRTHEPMENA.RALSDRTD 
VLLEGGPTLAGAFLRAGAINRILAYVAPILLGGPVTAVDDVGVSNITNALRWQFDSVEKVGPDLLLSLV 
AR 
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>Rv1440 secGTB.seq 1617715:1618065 MW:12140 SEQ ID NO:200 

VAG\n"AAVSARLKADEARRPGFYAAGSGPLPQ\mGSTLPVMEUU.QITLIVTSVLNA/Ll.V^ 

GLSTLFGGGVQSSLSGSTWEKNLDRLTLFVTGIWLVSIIGVALLIKYR 

5 >Rv1484 inhATB.seq 1674200:1675006 MW:28529 SEQ ID NO:201 

MTGLLDGKRILVSGIITDSSIAFHIARVAQEQGAQLVLTGFDRLRLIQRITDRLPAKAPLLELDVQNEEH 
LASLAGRVTEAIGAGNKLDGWHSIGFMPQTGMGINPFFDAPYADVSKGIHISAYSYASMAKALLPIM 
NPGGSIVGMDFDPSRAMPAYNWMTVAKSALESVNRFVAREAGKYGVRSNLVAAGPIRTLAMSAIVG 
GALGEEAGAQIQLLEEGWDQRAPIGWNMKDATPVAKTVCALLSDWLPATTGDIIYADGGAHTQLL 

10 

>Rv1617 pykA pyruvate kinase TB.seq 1816187:1817602 MW:50668 SEQ ID NO:202 

VTRRGKIVCTLGPATQRDDLVRALVEAGMDVARMNFSHGDYDDHKVAYERVRVASDATGRAVGVL 
ADLQGPKIRLGRFASGATHWAEGEWRITVGACEGSHDRVSTTYKRLAQDAVAGDRVLVDDGKVAL 
WDAVEGDDWCTVVEGGPVSDNKGISLPGMNVTAPALSEKDIEDLTFALNLGVDMVALSFVRSPAD 
15 VELVHEVMDRIGRRVPVIAKLEKPEAIDNLEAIVLAFDAVMVARGDLGVELPLEEVPLVQKRAIQMARE 
NAKPVIVATQMLDSMIENSRPTRAEASDVANAVLDGADALMLSGETSVGKYPLAAVRTMSRIICAVEE 
NSTAAPPLTHIPRTKRGVISYAARDIGERLDAKALVAFTQSGDTVRRLARLIHTPLPLLAFTAWPEVRS 
QU^IVrrWGTETFIVPKMQSTDGMIRQVDKSLLELARYKRGDLWIVAGAPPGWGSTNLIHVHRIGEDD 
V 

20 

>Rv1630 rpsA 30S ribosomal protein SI TB.seq 1833540:1834982 MW:53203 SEQ ID NO:203 
MPSPTWSPQVAVNDIGSSEDFU^DKTIKYFNDGDIVEGTIVKVDRDEVLLDIGYKTEGVIPARELSIK 
HDVDPNEWSVGDEVEALVLTKEDKEGRLILSKKRAQYERAWGTIEALKEKDEAVKGTVIEWKGGLI 
LDIGLRGFLPASLVEMRRVRDLQPYIGKEIEAKIIELDKNRNNNA/LSRRAWLEQTQSEVRSEFLNNLQK 
25 GTIRKGWSSIVNFGAFVDLGGVDGLVHVSELSWKHIDHPSEWQVGDEVTVEVLDVDMDRERVSLS 
LKATQEDPWRHFARTHAIGQIVPGKVTKLVPFGAFVRVEEGIEGLVHISELAERHVEVPDQWAVGDD 
AMVKVIDIDLERRRISLSLKQANEDYTEEFDPAKYGMADSYDEQGNYIFPEGFDAETNEWLEGFEKQ 
RAEWEARYAEAERRHKMHTAQMEKFAAAEAAGRGADDQSSASSAPSEKTAGGSLASDAQLAALRE 
KLAGSA 

30 

>Rv1631 -TB.seq 1835011:1836231 MW:44669 SEQ ID NO:204 

MLRIGLTGGIGAGKSLLSTTFSQCGGIWDGDVLAREWQPGTEGLASLVDAFGRDILLADGALDRQA 
LAAKAFRDDESRGVLNGIVHPLVARRRSEIIAAVSGDAVWEDIPLLVESGMAPLFPLVWVHADVELR 
VRRLVEQRGMAEADARARIAAQASDQQRRAVADVWLDNSGSPEDLVRRARDVWNTRVQPFAHNL 
35 AQRQIARAPARLVPADPSWPDQARRIVNRLKIACGHKALRVDHIGSTAVSGFPDFLAKDVIDIQVTVE 
SLDVADELAEPLLAAGYPRLEHITQDTEKTDARSTVGRYDHTDSAALWHKRVHASADPGRPTNVHLR 
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>mGWPNQQFALLF=VDWLAANPGAREDYLTVKCDADRRADGELARYVTAKEPWFLDAYQRAVVEWA 
DAVHWRP 

>Rv1706c- TB.seq 1932695:1933876 MW:39779 SEQ ID NO:205 
5 MTLDVPVNQGHVPPGSVACCLVGVTAVADGIAGHSLSNFGALPPEINSGRMYSGPGSGPLMAAAAA 
WDGUVAELSSAATGYGAAISELTNMRVmSGPASDSMVAAVLPFVGWLSTTATLAEQAAMCWRA^ 
AAFEAAFA^/^rVPPPAIAANRTLL^^^\^TNWFGQNTPAIATTESQYAEMWAQDAAAMYGYASAAAP 
ATVLTPFAPPPQTTNATGLVGHATAVAALRGQHSWAAAIPWSDIQKYWMMFLGALATAEGFIYDSG 
GLTLNALQFVGGMLWSTALAEAGAAEAAAGAGGAAGWSAWSQLGAGPVAASATLAAKIGPMSVPP 
10 GWSAPPATPQAQWARSIPGIRSAAEAAETSVLLRGAPTPGRSRAAHMGRRYGRRLTVMADRPNVG 

>Rv1745c - similar to Q46822 ORF_0182 TB.seq 1971381:1971989 MW:22490 SEQ ID NO:206 
MTRSYRPAPPIERWU.NDRGDATGVADKATVHTGDTPLHLAFSSYWDLHDQLLITRRAATKRTVVP 
AVV\n-NSCCGHPLPGESLPGAIRRRLAAELGLTPDRVDLILPGFRYRAAMADGTVENEICPVYRVQVD 
1 5 QQPRPNSDEVDAIRWLSWEQFVRDVTAGVIAPVSPWCRSQLGYLTXLGPCPAQWPVADDCRLPKA 
AHGN 

>Rv1800 -TB.seq 2039451:2041415 MW:67068 SEQ ID NO:207 

MLPNFAVLPPEVNSARVFAGAGSAPMLAAAAAWDDLASELHCAAMSFGSVTSGLWGWWQGSASA 
20 AMVDAAASYIGWLSTSAAHAEGAAGLARAAVSVFEEALAATS/HPAIV1VAANRAQVASLVASNLFGQN 
APAIAALESLYECMWAQDAAAMAGYYVGASAVATQLASWLQRLQSIPGAASLDARLPSSAEAPMGV 
VRAVNSAIAANAAAAQTVGLVMGGSGTPIPSARYVELANALYMSGSVPGVIAQALFTPQGLYPWVIK 
NLTFDSSVAQGAVILESAIRQQIAAGNNVrVFGYSQSATISSLVMANLAASADPPSPDELSFTLIGNPN 
NPNGGVATRFPGISFPSLGVTATGATPHNLYPTKIYTIEYDGVADFPRYPLNFVSTLNAIAGTYYVHSN 
25 YFILTPEQlbAAVPLTNTVGPTly4TQYYIIRTENLPLLEPLRSVPIVGNPLANLVQPNLKVIVNLGYGDPA 
YGYSTSPPNVATPFGLFPEVSPWIADALVAGTQQGIGDFAYDVSHLELPLPADGSTMPSTAPGSGT 
PVPPLSIDSLIDDLQVANRNLANTISKVAATSYATVLPTADIANAALTIVPSYNIHLFLEGIQQALKGDPM 
GLVNAVGYPLAADVALFTAAGGLQLUIISAGRTIANDISAIVP 

30 >Rv1 844c gnd 6-phosphogluconate dehydrogenase (Gram -) TB.seq 2093732:20951 86 
MW:51548 SEQ ID NO:208 

MSSSESPAGIAQIGVTGLAVMGSNIARNFARHGYTVAVHNRSVAKTDALLKEHSSDGKFVRSETIPEF 
LAALEKPRRVLIMVKAGEATDADAVINELADAMEPGDIIiDGGNALYTDTMRREKAMRERGLHFVGAG 
ISGGEEGALNGPSIMPGGPAESYQSLGPLLEEISAHVDGVPCCTHIGPDGSGHFVKMVHNGIEYSDM 
35 QLIGEAYQLMRDGLGLTAPAIADVFTEWNNGDLDSYLVEITAEVLRQTDAKTGKPLVDVIVDRAEQKG 
TGRWTVKSALDLGVPVTGIAEAVFARALSGSVGQRSAASGLASGKLGEQPADPATFTEDVRQALYA 
SKIVAYAQGFNQIQAGSAEFGWDITPGDLATIWRGGCIIRAKFLNHIKEAFDASPNLASLIVAPYFRGA 
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VESAIDSWRRWSTAAQLGIPTPGFSSALSYYDALRTARLPAALTQAQRDFFGAHTYGRIDEPGKFHT 
LWSSDRTEVPV 

>Rv1900c lipJ TB.seq 2146246:2147631 MW:49685 SEQ ID NO:209 
5 VAQAPHIHRTRYAKCGDMDIAYQVLGDGPTDLLVLPGPFVPIDSIDDEPSLYRFHRRLASFSRVIRLDH 
RGVGLSSRLAAITTLGPKFWAQDAIAVMDAVGCEQATIFAPSFHAMNGLVLAADYPERVRSLIWNGS 
ARPLWAPDYPVGAQVRRADPFLTVALEPDAVERGFDVLSIVAPTVAGDDVFRAWWDLAGNRAGPP 
SIARAVSKVIAEADVRDVLGHIEAPTLILHRVGSTYIPVGHGRYLAEHIAGSRLVELPGTDTLYWVGDT 
GPMLDEIEEFITGVRGGADAERMLATIMFTDIVGSTQHAAALGDDRWRDLLDNHDTIVCHEIQRFGGR 
10 EVNTAGDGFVATFTSPSAAIACADDIVDAVAALGIEVRIGIHAGEVEVRDASHGTDVAGVAVHIGARVC 
ALAGPSEVLVSSTVRDIVAGSRHRFAERGEQELKGVPGRWRLCVLMRDDATRTR 

>Rv1967 - TB.seq 2210599:221 1624 MW:36516 SEQ ID NO:210 

MRENLGGWVRLGWLAVCLLTAFLLIAVFGEVRFGDGKTYYAEFANVSNLRTGKLVRIAGVEVGKVT 
15 RISINPDAWRVQFrADNS\nT.TRGTRAVlRYDNLFGDRYLALEEGAGGLAVLRPGHTIPLARTQPALD 
LDALIGGFKPLFRALNPEQVNALSEQLLHAFAGQGPTIGSLLAQSAAVTtiTLADRDRLIGQVITNLNW 
LGSLGAHTDRLDQAVTSLSALIHRLAQRKTDlSNAVAYTNAAAGSVADLLSCaARAPLAKWRETDRVA 
Gl AAADHDYLDNLLNTLPDKYQAL\mQGMYGDFFAFYLCDWLK\n»4GKGGQPV\1l<LAGQDSGRCA 
PK 

20 

>Rv1975 - TB.seq 2218050:2218712 MW:23650 SEQ ID NO:21 1 

MSRRASATCALSATTAVAIMAAPAARADDKRLNDGWANVYTVQRQAGCTNDVTINPQLQLAAQWH 
TLDLLNNRHLNDDTGSDGSTPQDRAHAAGFRGKVAEWAINPAVAISGIELINQWYVTNJPAFFAIMSDC 
AhTTQIGVWSENSPDRTVWAVYGQPDRPSAMPPRGAVTGPPSPVAAQENVPIDPSPDYDASDElEY 
25 GINWLPWILRGVYPPPAMPPQ 

>Rv1981c nrdF ribonucleotide reductase small subunit TB.seq 2224221:2225186 MW:36591 
SEQ ID NO:212 

MTGKLVERVHAINWNRLLDAKDLQVWERLTGNFWLPEKIPLSNDLASWQTLSSTEQQTTIRVFTGLT 
30 LLDTAQATVGAVAMIDDAVTPHEEAVLTNMAFMESVHAKSYSSIFSTLCSTKQIDDAFDWSEQNPYL 
QRKAQIIVDYYRGDDALKRKASSVMLESFLFYSGFYLPMYWSSRGKLTNTADLIRLIIRDEAVHGiYYIG 
YKCQRGLADLTDAERADHREYTCELLIHTLYANEIDYAHDLYDELGVVTDDVLPYMRYNANKALANLG 
YQPAFDRDTCQVNPAVRAALDPGAGENHDFFSGSGSSYVMGTHQPTTDTDWDF 

35 >Rv2092c helY helicase, Ski2 subfamily TB.seq 2349335:2352052 MW:99576 SEQ ID NO:21 3 
WEU^LDRFTAELPFSLDDFQQRACSALERGHGVLVCAPTGAGKTVVGEFAVHLALAAGSKCFYTT 
PLKALSNQKHTDLTARYGRDQIGLLTGDLSVNGNAP^A/V^/^^TEVLRNMLYADSPALQGLSYWMD^ 
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VHFLADRMRGPVWEEVILQLPDDVRWSLSAWSNAEEFGGWIQTVRGDTTVWDEHRPVPLWQHV 
LVGKRMFDLFDYRIGEAEGQPQVNRELLRHIAHRREADRMADWQPRRRGSGRPGFYRPPGRPEVI 
AKLDAEGLLPAITFVFSRAGCDAAVTQCLRSPLRLTSEEERARIAEVIDHRCGDLADSDLAVLGYYEW 
REGLLRGLAAHHAGMLPAFRHTVEELFTAGLVKAVFATETUVLGINMPARTVA/LERLVKFNGEQHMP 
5 LTPGEYTQLTGRAGRRGIDVEGHAWIWHPEIEPSEVAGLASTRTFPLRSSFAPSYNMTINLVHRMGP 
QQAHRLLEQSFAQYQADRSWGLVRGIERGNRILGEIAAELGGSDAPILEYARLRARVSELERAQARA 
SRLQRRQAATDALAALRRGDIITITHGRRGGLAWLESARDRDDPRPLVLTEHRWAGRISSADYSGTT 
PVGSMTLPKRVEHRQPRVRRDLASALRSAAAGLVIPAARRVSEAGGFHDPELESSREQLRRHPVHT 
SPGLEDQIRQAERYLRIERDNAQLERKVAAATNSLARTFDRFVGLLTEREFIDGPATDPWTDDGRLL 
10 ARIYSESDLLVAECLRTGAWEGLKPAELAGWSAWYETRGGDGQGAPFGADVPTPRLRQALTQTS 
RLSTTLRADEQAHRITPSREPDDGFVRVIYRWSRTGDLAAALAAADVNGSGSPLLAGDFVRWCRQV 
LDLLDQVRNAAPNPELRATAKRAIGDIRRGWAVDAG 

>Rv2101 helZ helicase, Snf2/Rad54 fomily TB.seq 2360238:2363276 MW:1 1 1632 
15 SEQ ID NO:214 

MLVLHGFWSNSGGMRLWAEDSDLLVKSPSQALRSARPHPFAAPADLIAGIHPGKPATAVLLLPSLRS 
APLDSPELIRLAPRPAARTDPMLU^WTVPWDLDPTAAU^DQPAPDVRYGASVDYLAELAVFA^ 
VERGR\/LPQLRRDTHGAAACVVRPV1.QGRDWAMTSLVSAMPPVCRAEVGGHDPHELATSALDAMV 
DAAVRAALSPMDLLPPRRGRSKRHRAVEAWLTALTCPDGRFDAEPDELDALAEALRPWDDVGIGTV 

20 GPARATFRLSEVETENEETPAGSLWRLEFLLQSTQDPSLLVPAEQAWNDDGSLRRWLDRPQELLLT 
ELGRASRIFPELVPALRTACPSGLELDADGAYRFLSGTAAVLDEAGFGVLLPSWWDRRRKLGLVLSA 
YTPVDGWGKASKFGREQLVEFRWELAVGDDPLSEEEIAALTETKSPLIRLRGQWVALDTEQMRRGL 
EFLERKPTGRKTTAEILAU^SHPDDVDTPLEWAVRADGWLGDLLAGAAAASLQPLDPPDGFTATLR 
PYQQRGLAWLAFLSSLGLGSCLADDMGLGKTVQLLALETLESVQRHQDRGVGPTLLLCPMSLVGN 

25 WPQEAARFAPNLRVYAHHGGARLHGEALRDHLERTDLWSTYTTATRDIDELAEYEWNRWLDEAQ 
AVKNSLSRAAKAVRRLRAAHRVALTGTPMENRLAELWSIMDFLNPGLLGSSERFRTRYAIPIERHGHT 
EPAERLRASTRPYILRRLKTDPAIIDDLPEKIEIKQYCXaLTTEQASLYQAWADMMEKiENTEGIERRGN 
VLAAMAKLKQVCNHPAQLLHDRSPVGRRSGKVIRLEEILEEILAEGDRVLCFTQFTEFAELLVPHLAAR 
FGRAARDIAYLHGGTPRKRRDEMVARFQSGDGPPIFLLSLKAGGTGLNLTAANHWHLDRWWNPAV 

30 ENQATDRAFRIGQRRWQVRKFICTGTLEEKIDEMIEEKKALADLVVTDGEGVVLTELSTRDLREVFAL 
SEGAVGE 

>Rv21 1 0c prcB proteasome [beta]-type subunit 2 TB.seq 2369727:2370599 MW:30274 
SEQ ID NO:215 

35 VTWPLPDRLSI NSLSGTPAVDLSSFTDFLRRQAPELLPASISGG APLAGGDAQLPHGTTIVALKYPGG 
WMAGDRRSTQGNMISGRDVRKVYITDDYTATGIAGTAAVAVEFARLYAVELEHYEKLEGVPLTFAG 
KINRLAIMVRGNLAAAMQGLLALPLLAGYDIHASDPQSAGRIVSFDAAGGWNIEEEGYQAVGSGSLFA 
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KSSMKKLYSQVTDGDSGLRVAVEALYDAADDDSATGGPDLVRGIFPTAVIIDADGAVDVPESRIAELA 
RAII ESRSGADTFGSDGGEK 

>Rv2118c - = B2126_C1_165 (83.6%) TB.seq 2377471:2378310 MW:30091 SEQ ID NO:216 
5 VSATGPFSIGERVQLTDAKGRRYTMSLTPGAEFHTHRGSIAHDAVIGLEQGSWKSSNGALFLVLRPL 
LVDYVMSMPRGPQVIYPKDAAQIVHEGDIFPGARVLEAGAGSGALTLSLLRAVGPAGQVISYEQRAD 
HAEHARRNVSGCYGQPPDNWRLWSDLADSELPDGSVDRAVLDMLAPWEVLDAVSRLLVAGGVLM 
VWAWrQLSRIVEALRAKQCV\n-EPRAVVETLQRGWNWGU\\mPQHSMRGIHTAFLVATRRI^ 
VAPAPLGRKREGRDG 

10 

>Rv2144c - TB.seq 2404166:2404519 MW:12028 SEQ 10 NO:217 

MLIIALVLALIGLLALVFANA/TSNQLVAWVCIGASVLGVALLIVDALRERQQGGADEADGAGETGVAEE 
ADVDYPEEAPEESQAVDAGVIGSEEPSEEASEATEESAVSADRSDDSAK 

15 >Rv2146c - TB.seq 2405667:2405954 MW:10805 SEQ ID NO:218 

LWFFQILGFALFIFmLLIARWVEFIRSFSRDWRPTGVTVVILEIIMSITDPPVKVLRRLIPQLTIGAVRF 
DLSIMVLLLVAFIGMQLAFGAAA 

>Rv2147c - TB.seq 2406119:2406841 MW:27630 SEQ ID NO:219 
20 VNSHCSHTFITDNRSPRARRGHAMSTLHKVKAYFGMAPMEDYDDEYYDDRAPSRGYARPRFDDDY 
GRYDGRDYDDARSDSRGDLRGEPADYPPPGYRGGYADEPRFRPREFDRAEMTRPRFGSWLRNST 
RGALAMDPRRMAMMFEDGHPLSKITTLRPKDYSEARTIGERFRDGSPVIMDLVSMDNADAKRLVDF 
AAGLAFALRGSFDKVATKVFLLSPADVDVSPEERRRtAETGFYAYQ 

25 >Rv2148c - TB.seq 2406841:2407614 MW:27694 SEQ ID NO:220 

MAADLSAYPDRESELTHALAAMRSRLAAAAEAAGRNVGEIELLPITKFFPATDVAILFRLGCRSVGES 
REQEASAKAMELNRLLAAAELGHSGGVHWHMVGRIQRNKAGSLARWAHTAHSVDSSRLVTALDRA 
WAALAEHRRGERLRVYVQVSLDGDGSRGGVDSTTPGAVDRICAQVQESEGLELVGLMGIPPLDWD 
PDEAFDRLQSEHNRVRAMFPHAIGLSAGMSNDLEVAVKHGSTCVRVGTALLGPRRLRSP 

30 

>Rv2150c fteZ TB.seq 2408386:2409522 MW:38757 SEQ ID NO:221 

l\/rrPPHNYLAVIKWGIGGGGVNAVNRMIEQGLKGVEFIAINTDAQALLMSDADVKLDVGRDSTRGL^ 
AGADPEVGRKAAEDAKDEIEELLRGADMVFVTAGEGGGTGTGGAPWASIARKLGALTVGWTRPF 
SFEGKRRSNQAENGIAALRESCDTLIVIPNDRLLQMGDAAVSLMDAFRSADEVLLNGVQGITDLITTP 
35 GLINVDFADVKGIMSGAGTALMGIGSARGEGRSLKAAEIAINSPLLEASMEGAQGVLMSIAGGSOLGL 
FEINEAASLVQDAAHPDANIIFGTVIDDSLGDEVRVTVIAAGFDVSGPGRKPVMGETGGAHRIESAKA 
GKLTSTLFEPVDAVSVPLHTNGATLSIGGDDDDVDVPPFMRR 
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>Rv2152c murC TB.seq 2410639:2412120 MW:51146 SEQ ID NO:222 

VSTEQLPPDLRRVHMVGIGGAGMSGIARILLDRGGLVSGSDAKESRGVHALRARGALIRIGHDASSL 
DLLPGGATAVAm-HAAlPKTNPELVEARRRGIPNA/LRPAVLAKLMAGRTTLMVTGTHGKTTTTSMLIV 
5 LQHCGLDPSFAVGGELGEAGTNAHHGSGDCFVAEADESDGSLLQYTPHVAVITNIESDHLDFYGSVE 
AYVAVFDSFVERIVPGGALWCTDDPGGAALAQRATELGIRVLRYGSVPGETMAATLVSWQQQGVG 
AVAHIRLASELATAQGPRVMRLSVPGRHMALNALGALLAAVQIGAPADEVLDGLAGFEGVRRRFELV 
GTCGVGKASVRVFDDYAHHPTEISATLAAARIVIVLEQGDGGRCMWFQPHLYSRTKAFAAEFGRALN 
AADEVFVLDVYGAREQPLAGVSGASVAEHVTVPMRYVPDFSAVAQQVAAAASPGDVIVTMGAGDVT 
10 LLGPEILTALRVRANRSAPGRPGVLG 

>Rv2153c murG TB.seq 2412120:2413349 MW:41829 SEQ ID NO:223 

VKDTVSQPAGGRGATAPRPADAASPSCGSSPSADSVSWLAGGGTAGHVEPAMAVADALVALDPR 
VRITALGTLRGLETRLVPQRGYHLELITAVPMPRKPGGDLARLPSRVWRAVREARDVLDDVDADVW 
15 GFGGYVALPAYLAARGLPLPPRRRRRIPWIHEANARAGLANRVGAHTADRVLSAVPDSGLRRAEW 
GVPVRASIAALDRAVLRAEARAHFGFPDDARVLLVFGGSQGAVSLNRAVSGAAADLAAAGVCVLHA 
HGPQNVLELRRRAQGDPPYVAVPYLDRMELAYAAADLVICRAGAMTVAEVSAVGLPAIYVPLPIGNG 
EQRLNALPWNAGGGMWADAALTPELVARQVAGLLTDPARLAAMTAAAARVGHRDAAGQVARAAL 
AVATGAGARTTT 

20 

>Rv2154c ftsW TB.seq 2413349:2414920 MW:56306 SEQ ID NO:224 

VLTRLLRRGTSDTDGSQTRGAEPVEGQRTGPEEASNPGSARPRTRFGAWLGRPMTSFHLIIAVAALL 
TTLGLIMVLSASAVRSYDDDGSAVVVIFGKQVLV\m.VGLIGGYVCLRMSVRFMRRIAFSGFAITIVMLVL 
VLVPGIGKEANGSRGWFWAGFSMQPSEIJ^AFAIWGAHLLAARRMERASLREMLIPLVPAAWAL 
25 ALIVAQPDLGQTVSMGIILLGLLWYAGLPLRVFLSSLAAWVSAAILAVSAGYRSDRVRSWLNPENDP 
QDSGYQARQAKFALAQGGIFGDGLGQGVAKWNYLPNAHNDFIFAIIGEELGLVGALGLLGLFGLFAY 
TGMRIASRSADPFLRLLTATTTLWVLGQAFINIGYVIGLLPVTGLQLPLISAGGTSTAATLSLIGIIANAAR 
HEPEAVAALRAGRDDKVNRLLRLPLPEPYLPPRLEAFRDRKRANPQPAQTQPARKTT»RTAPGQPAR 
QMGLPPRPGSPRTADPPVRRSVHHGAGQRYAGQRRTRRVRALEGQRYG 

30 

>Rv2155c murO TB.seq 2414935:2416392 MW:49314 SEQ ID NO:225 

VLDPLGPGAPVLVAGGRVTGQAVAAVLTRFGATPTVCDDDPVMLRPHAERGLPTVSSSDAVQQITG 
YALWASPGFSPATPLLAAAAAAGVPIWGDVELAWRLDAAGCYGPPRSWLWTGTNGKTTTTSMLH 
AMLIAGGRRAVLCGNIGSAVLDVLDEPAELLAVELSSFQLHWAPSLRPEAGAVLNIAEDHLDWHATM 
35 AEYTAAKARVLTGGVAVAGLDDSRAAALLDGSPAQVRVGFRLGEPAARELGVRDAHLVDRAFSDDL 
TLLPVASIPVPGPVGVLDALAAAALARSVGVPAGAIADAVTSFRVGRHRAEWAVADGITYVDDSKAT 
NPHAARASVLAYPRWWIAGGLLKGASLHAEVAAMASRLVGAVLIGRDRAAVAEALSRHAPDVPWQ 
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WAGEDTGMPATVEVPVACVLDVAKDDKAGETVGAAVMTAAVAAARRMAQPGDTVLLAPAGASFD 
QFTGYADRGEAFATAVRAVIR 

>Rv2156c murX TB.seq 2416397:2417473 MW:37714 SEQ ID NO:226 
5 MRQILIAVAVAVTVSILLTPVLIRLFTKQGFGHQIREDGPPSHHTKRGTPSMGGVAILAGIWAGYLGAH 
LAGLAFDGEGIGASGLLVLGLATALGGVGFIDDLIKIRRSRNLGLNKTAKTVGQITSAVLFGVLVLQFRN 
AAGLTPGSADLSWREIATVmJ^VLFVLFCWIVSAWSNAVNFTDGLDGLAAGTMAMVTAAYVLITF 
WQYRNACVTAPGLGCYNVRDPLDLALIAAATAGACIGFLWWNAAPAKIFMGDTGSLALGGVIAGLSV 
TSRTEIU^\m.GALFVAEITSWLQILTFRTTGRRMFRMAPFHHHFELVGWAETTVIIRFVVLLTAITCGL 
10 GVALFYGEWLAAVGA 

>Rv2157c murF TB.seq 2417473:2419002 MW:51634 SEQ ID NO:227 

MIELTVAQIAEIVGGAVADISPQDAAHRRVTGTVEFDSRAIGPGGLFLALPGARADGHDHAASAVAAG 
AAWLAARPVGVPAIWPPVAAPNVLAGVLEHDN DGSGAAVLAALAKLATAVAAQLVAGGLTI IGITGS 

15 SGKTSTKDLMAA\AJKPLGEWAPPGSFNNELGHPWTVLRATRRTDYLILEMAARHHGNIAAI^IA^ 
SIGWLNVGTAHLGEFGSREVIAQTKAELPQAVPHSGAWLNADDPAVAAMAKLTAARWRVSRDNT 
GDVWAGPVSLDELARPRFTLHAHDAQAEVRLGVCGDHQVTNALCAAAVALECGASVEQVAAALTAA 
PPVSRHRMQVTTRGDGVTVIDDAYNANPDSMRAGLQALAWIAHQPEATRRSWAVLGEMAELGEDAI 
AEHDRIGRLAVRLDVSRLWVGTGRSISAMHHGAVLEGAWGSGEATADHGADRTAVNVADGDAALA 

20 LLRAELRPGDWLVKASNAAGLGAVADALVADDTCGSVRP 

>Rv21 58c murE TB.seq 241 9002:2420606 MW:55310 SEQ ID NO:228 

VSSLARGISRRRTEVATQVEAAPTGLRPNAWGVRLAALADQVGAALAEGPAQRAVTEDRTVTGVTL 
RAQDVSPGDLFAALTGSTTHGARHVGDAIARGAVAVLTDPAGVAEIAGRAAVPVLVHPAPRGVLGGL 

25 AATWGHPSERLTVIGITGTSGKTTTTYLVEAGLRAAGRVAGLIGTIGIRVGGADLPSALTTPEAPTLQA 
MlJ^A^4VERGVDT^A/MEVSSHALALGRVDGTRFAVGAFTNLSRDHLDFHPSMADYFEAKASLFDPDS 
ALRARTAWCIDDDAGRAMAARAADAITVSAADRPAHWRATDVAPTDA6GQQFTAIDPAGVGHHIGI 
RLPGRYNVANCLVALAILDTVGVSPEQAVPGLREIRVPGRLEQIDRGQGFLALVDYAHKPEALRSVLT 
TLAHPDRRLAWFGAGGDRDPGKRAPMGRIAAQLADLVWTDDNPRDEDPTAIRREILAGAAEVGGD 

30 AQWEIADRRDAIRHAVAWARPGDWLIAGKGHETGQRGGGRVRPFDDRVELAAALEALERRA 

>Rv2159c - TB.seq 2420632:2421663 MW:36377 SEQ ID NO:229 

MKFVNHIEPVAPRRAGGAVAEVYAEARREFGRLPEPLAMLSPDEGLLTAGWATLRETLLVGQVPRG 
RKEAVAAAVAASLRCPWCVDAHTTMLYAAGQTDTAAAILAGTAPAAGDPNAPYVAWAAGTGTPAGP 
35 PAPFGPDVAAEYLGTAVQFHFIARLVLVLLDETFLPGGPRAQQLMRRAGGLVFARKVRAEHRPGRST 
RRLEPRTLPDDLAWATPSEPIATAFAALSHHLDTAPHLPPPTRQWRRWGSWHGEPMPMSSRWTN 
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EHTAELPADLHAPTRLALLTGL/kPHQVTDDDVAAARSLLDTDAALVGALAWAAFTAARRIGTWIGAAA 
EGQVSRQNPTG 

>Rv2163c pbpB TB.seq 2425049:2427085 MW:72506 SEQ ID NO:230 
5 VSRAAPRRASQSQSTRPARGLRRPPGAQEVGQRKRPGKTQKARQAQEATKSRPATRSDVAPAGR 
STRARRTRQWDVGTRGASF=VFRHRTGNAVILVLMLVAATQLFFLQVSHAAGLRAQAAGQLKVTDV 
QPAARGSIVDRNNDRLAFTIEARALTFQPKRIRRQLEEARKKTSAAPDPQQRLRDIAQEVAGKLNNKP 
DAAAVLKKLQSDETFVr«TARAVDPAVASAlCAKYPEVGAERQDLRQYPGGSU\ANWGGIDWDGHG 
LLGLEDSLDAVLAGTDGSNm'DRGSDGWIPGSYRNRHKAVHGSTVVLTLDNDIQFYVQQQVQQAK 

10 NLSGAH^IVSAWLDAKTGEVLJ^MANDNTFDPSQDIGRQGDKQLGNPAVSSPFEPGSVNKIVAASAVI 
EHGLSSPDEVLQVPGSIQMGGVTVHDAWEHGVMPYTTTGVFGKSSNVGTLMLSQRVGPERYYDML 
RKFGLGQRTGVGLPGESAGLVPPIDQWSGSTFANLPIGQGLSMTLLQMTGMYQAIANDGVRVPPRII 
KATVAPDGSRTEEPRPDDIRWSAQTAQ7VRQMLRAWQRDPMGYQQGTGPTAGVPGYQMAGKT 
GTAQQINPGCGCYFDDVYWITFAGIATADNPRYVIGIMLDNPARNSDGAPGHSAAPLFHNIAGWLMQ 

15 RENVPLSPDPGPPLVLQAT 

>Rv2165c -TB.seq 2428236:2429423 MW:42498 SEQ ID NO:231 

VQTRAPWSLPEATLAYFPNARFVSSDRDLGAGAAPGIAASRSTACQTWGGITVADPGSGPTGFGUV 
PVLAQRCFELLTPALTRYYPDGSCaAVLLDATIGAGGHAERFLEGLPGLRLIGLDRDPTALDVARSRLV 
20 RFADRLTLVHTRYDCLGAALAESGYAAVGSVDGILFDLGVSSMQLDRAERGFAYATDAPLDMRMDP 
TTPLTAADIVNTYDEAALADILRRYGEERFARRIAAGIVRRRAKTPFTSTAELVALLYQAIPAPARRVGG 
HPAKRTFQALRIAVNDELESLRTAVPAALDALAIGGRIAVLAYQSLEDRIVKRVFAEAVASATPAGLPV 
ELPGHEPRFRSLTHGAERASVAEIERNPRSTPVRLRALQRVEHRAQSQQWATEKGDS 

25 >Rv2166c - TB.seq 2429428:2429856 MW:15912 SEQ ID NO:232 

MFLGTYTPKLDDKGRLTLPAKFRDALAGGLMVTKSQDHSLAVYPRAAFEQLARRASKAPRSNPEAR 

AFLRNLAAGTDEQHPDSQGRITLSADHRRYASLSKDCWIGAVDYLEIWDAQAWQNYQQIHEENFSA 

ASDEALGDIF 

>Rv21 97c - TB.seq 2461505:2462146 MW:22481 SEQ ID NO:233 
30 MVSRYSAYRRGPDVISPDVIDRILVGACAAVWLVFTGVSVAAAVALMDLGRGFHEMAGNPHTTWVL 
YAVIWSALVIVGAIPVLLRARRMAEAEPATRPTGASVRGGRSIGSGHPAKRAVAESAPVQHADAFEV 
AAEWSSEAVDRIWIJ^GT\A/LTSAIGIALIAVAAATYLMAVGHDGPSWISYGLAG\A/TAGMPVIEVVLYA 
RQLRRWAPQSS 

>Rv2198c - TB.seq 2462149:2463045 MW:30955 SEQ ID NO:234 
35 MSGPNPPGREPDEPESEPVSDTGDERASGNHLPPVAGGGDKLPSDQTGETDAYSRAYSAPESEHV 
TGGPYVPADLRLYDYDDYEESSDLDDELAAPRWPWWGVAAIIAAVALWSVSLLVTRPHTSKLATG 
DTTSSAPPVQDEITTTKPAPPPPPPAPPPTTEIPTATETQTVTVTPPPPPPPATTTAPPPATTTTAAAP 
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PPTTTTPTGPRQVTYSVTGTKAPGDIISVmVDAAGRRRTQHNNAnPWSIvrr^ 
FRVSKLNCSITTSDGTVLSSNSNDGPQTSC 

>Rv2199c - TB.seq 2463234:2463650 MW:14866 SEQ ID NO:235 
5 MHIEARLFEFVAAFFWTAVLYGVLTSMFATGGVEWAGTTALALTGGMALIVATFFRFVARRLDSRPE 
DYEGAEISDGAGELGFFSPHSVVWPIMVALSGSVAAVGIALVVLPVVLIAAGVAFILASAAGLVFEYYVGP 
EKH 

>Rv2200c ctaC TB.seq 2463661:2464749 MW:4044g SEQ ID NO:236 

10 WPRGPGRLQRLSQCRPQRGSGGPARGLRQUVLAAMLGALAVTVSGCSWSEALGIGVVPEGITPEA 
HLNRELWIGAVIASLAVGViVWGLIFWSAVFHRKKNTDTELPRQFGYNMPLELVLTVIPFLIISVLFYFT 
\AA/QEKMLQIAKDPEVVlDITSFQWNVyft<FGYQRVNFKDGTLTYDGADPERKRAMVSKPEGKDKYGE 
ELVGPVRGU^TEDRTYLNFDKVETLGTSTEIPVLVLPSGKRIEFQMASADVIHAFVVVPEFLFKRDVMP 
NPVANNSVNVFQIEEITKTGAFVGHCAEMCGTYHSMMNFEVRWTPNDFKAYLQQRIDGKTNAEALR 

15 AINQPPLAVTTHPFDTRRGELAPQPVG 

>Rv2427c proA g-glutamyl phosphate reductase TB.seq 2724231 :2725475 MW:43746 
SEQ ID NO:237 

MTVPAPSQLDLRQEVHDAARRARVAARRU^SLPTrVKDRALHAAADELLAHRDQILAANAEDLNAAR 
20 EADTPAAMLDRLSLNPQRVDGIAAGLRQVAGLRDPVGEVLRGYTLPNGLQLRQQRVPLGWGMIYE 
GRPNVTVDAFGLTLKSGNAALLRGSSSAAKSNEALVAVLRTALVGLELPADAVQLLSAADRATVTHLI 
QARGLVDWIPRGGAGLIEAWRDAQVPTIETGVGNCHVYVHQAADLDVAERILLNSKTRRPSVCNA 
AETLLVDAAIAETALPRLLAALQHAGVTVHLDPDEADLRREYLSLDIAVAWDGVDAAIAHINEYGTGH 
TEAIVTTNLDAAQRFTEQIDAAAVMVNASTAFTDGEQFGFGAEIGISTQKLHARGPMGLPELTSTKWI 
25 AWGAGHTRPA 

>Rv2438c - similar to YHN4_YEAST P38795 TB.seq 2734793:2737006 MW:80492 
SEQ ID NO:238 

MGLLGGQSGPRVGSGPVGSIPTPVNAAICQQRGGFHGVERGYSAGDSGVLTSLGDNERTMNFYSA 
30 YQHGFVRVAACTHHTTIGDPAANAASVLDMARACHDDGAALAVFPELTLSGYSIEDVLLQDSLLDAV 
EDALLDLVn-ESADLLPVLWGAPLRHRHRIYNTAWIHRGAVLGWPKSYLPTYREFYERRQMAPGD 
GERGTIRIGGADVAFGTDLLFAASDLPGFVLHVEICEDMFVPMPPSAEAALAGATVLANLSGSPITIGR 
AEDRRLLARSASARCLAAYVYAAAGEGESTTDLAWDGQTMIWENGALLAESERFPKGVRRSVADVD 
TELLRSERLRMGTFDDNRRHHRELTESFRRIDFALDPPAGDIGLLREVERFPFVPADPQRLQQDCYE 
35 AYNIQVSGLEQRLRALDYPKWIGVSGGLDSTHALIVATHAMDREGRPRSDILAFALPGFATGEHTKN 
NAIKLARALGVTFSEIDIGDTARLMLHTIGHPYSVGEKVYDVTFENVQAGLRTDYLFRIANQRGGIVLG 
TGDLSELALGWSTYGVGDQMSHYNVNAGVPKTLIQHLIRWVISAGEFGEKVGEVLQSVLDTEITPELI 
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PTGEEELQSSEAKVGPFALQDFSLFQVLRYGFRPSKIAFLAWHAWNDAERGNWPPGFPKSERPSYS 
LAEIRHWLQIFVQRFYSFSQFKRSALPNGPKVSHGGALSPRGDWRAPSDMSARIWLDQIDREVPKG 

>Rv2439c proB glutamate 5-kinase TB.seq 27371 18:2738245 r4W:38789 SEQ ID NO:239 
5 MRSPHRDAIRTARGLVVKVGTTALTTPSGMFDAGRLAGLAEAVERRMKAGSDWIVSSGAIAAGIEPL 
GLSRRPKDLATKQAAASVGQVALVNSWSAAFARYGRTVGQVLLTAHDISMRVQHTNAQRTLDRLRA 
LHAVAIVNENDTVATNEIRFGDNDRLSALVAHLVGADALVLLSDIDGLYDCDPRKTADATFIPEVSGPA 
DLDGWAGRSSHLGTGGMASKVAAALLAADAGVPVLLAPAADAATALADASVGTVFAARPARLSAR 
RFVWRYAAEATGALTLDAGAVRAVNmQRRSLLAAGITAVSGRFCGGDNA/ELRAPDAAMVARGWAY 
1 0 DASELATMVGRSTSELPGELRRPWHADDLVAVSAKQAKQV 

>Rv2440c obg Obg GTP-binding protein TB.seq 2738248:2739684 MW:50430 
SEQ ID NO:240 

VPRFVDRWIHTRAGSGGNGCASVHREKFKPLGGPDGGNGGRGGSIVFWDPQVHTLLDFHFRPHL 
15 TAASGKHGMGNNRDGAAGADLEVKVPEGTWLDENGRLLADLVGAGTRFEAAAGGRGGLGNAALA 
SRVRKAPGFALLGEKGQSRDLTLELKTVADVGLVGFPSAGKSSLVSAISAAKPKIADYPFTTLVPNLG 
WSAGEHAFTVADVPGLIPGASRGRGLGLDFLRHIERCAVLVHWDCATAEPGRDPISDIDALETELA 
CYTPTLQGDAALGDLAARPRAWLNKIDVPEARELAEFVRDDIAQRGWPVFCVSTATRENLQPLIFGL 
SQMISDYNAARPVAVPRRPVIRPIPVDDSGFTVEPDGHGGFWSGARPERWIDQTNFDNDEAVGYL 
20 ADRLARLGVEEELLRLGARSGCAVTIGEMTFDWEPQTPAGEPVAMSGRGTDPRLDSNKRVGAAER 
KAARSRRREHGDG 

>Rv2441c rpmA 503 ribosomal protein L27 TB.seq 2739773:2740030 MW:8969 
SEQ ID NO:241 

25 MAHKKGASSSRNGRDSAAQRLGVKRYGGQWKAGEILVRQRGTKFHPGVNVGRGGDDTLFAKTAG 
AVEFGIKRGRKTVSIVGSTTA 

>Rv2442c rplU 50S ribosomal protein L21 TB.seq 2740048:2740359 MW:1 1 152 
SEQ ID NO:242 

30 MMATYAIVKTGGKQYKVAVGDWKVEI<LESEQGEKVSLPVALWDGAT\mX)AKAlj^ 
HTKGPKIRIHKFKNKTGYHKRQGHRQQLTVLKVTGIA 

>Rv2448c valS valyl-tRNA synthase TB.seq 2747596:2750223 MW:97822 SEQ ID NO:243 
MLPKSWDPAAMESAIYQKWLDAGYFTADPTSTKPAYSIVLPPPh4VTGSLHMGHALEHTMMDALTRR 
35 KRMQGYEVLWQPGTDHAGIATQSWEQQLAVDGKTKEDLGRELFVDKVWDWKRESGGAIGGQMR 
RLGDGVDWSRDRFTMDEGLSRAVRTIFKRLYDAGLIYRAERLVNWSPVLQTAISDLEVh4YRDVEGEL 
VSFRYGSLDDSQPHIWATTRVETMLGDTAIAVHPDDERYRHLVGTSLj^HPFVDRELAIVADEHVDPE 
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FGTGAVKVTPAHDPNDFEIGVRHQLPMPSILDTKGRIVDTGTRFDGMDRFEARVAVRQALAAQGRV 
VEEKRPYLHSVGHSERSGEPIEPRLSLQWVVVRVESLAKAAGDAVRNGDTVIHPASMEPRVVFSWVD 
DMHDWCISRQLWWGHRIPIVVYGPDGEQVCVGPDETPPQGWEQDPDVLD7VVFSSALWPFSTLGW 
PDKTAELEKFYPTSVLNTTGYDILFFVVVARMMMFGTFVGDDAAITLDGRRGPQVPFTDVFLHGLIRDE 
5 SGRKMSKSKGNVIDPLDWVEMFGADALRFTLARGASPGGDLAVSEDAVRASRNFGTKLFNATRYAL 
LNGAAPAPLPSPNELTDADRWILGRLEEVRAEVDSAFDGYEFSRACESLYHFAWDEFCDWYLELAK 
TQU^QGLTHTTAVU\AGLDTLLRU.HPVIPFLTEALVVUid.TGRESLVSADWPEPSGISVDLVAAQRIND 
MQKLVTEVRRFRSDQGLADRQKVPARMHGVRDSDLSNQVAAVTSLAWLTEPGPDFEPSVSLEVRL 
GPEMNRTWVELDTSGTIDVAAERRRLEKELAGAQKELASTAAKLANADFLAKAPDAVIAKIRDRQRV 
10 AQQETERITTRLAALQ 

>Rv2482c plsB2 TB.seq 2786915:2789281 MW:88284 SEQ ID NO:244 

VTKPAADASAVLTAEDTLVLASTATPVEMELIMGWLGQQRARHPDSKFDILKLPPRNAPPAALTALVE 
QLEPGFASSPQSGEDRSIVPVRVIWLPPADRSRAGKVAALLPGRDPYHPSQRQQRRILRTDPRRAR 

15 WAGESAKVSELRQQWRDTTVAEHKRDFAQFVSRRALLALARAEYRILGPQYKSPRLVKPEMLASA 
RFRAGLDRIPGATVEDAGKMLDELSTGWSQVSVDLVSVLGRLASRGFDPEFDYDEYQVAAMRAALE 
AHPAVLLFSHRSYIDGWVPVAMQDNRLPPVHMFGGINLSFGLMGPLMRRSGMIFIRRNIGNDPLYK 
YVLKEYVGYWEKRFNLSWSIEGTRSRTGKMLPPKLGLMSYVADAYLDGRSDDILLQGVSICFDQLH 
EITEYAAYARGAEKTPEGLRWLYNFIKAQGERNFGKIYVRFPEAVSMRQYLGAPHGELTQDPAAKRL 

20 ALQKMSFEVAWRI LQATPVTATGLVSALLLTTRGTALTLDQLHHTLQDSLDYLERKQSPVSTSALRLR 
SREGVRAAADALSNGHPVTRVDSGREPVWYIAPDDEHAAAFYRNSVIHAFLETSIVELALAHAKHAE 
GDRVAAFWAQAMRLRDLLKFDFYFADSTAFRANIAQEMAWHQDWEDHLGVGGNEIDAMLYAKRPL 
MSDAMLRVFFEAYEIVADVLRDAPPDIGPEELTELALGLGRQFVAQGRVRSSEPVSTLLFATARQVAV 
DQELIAPAADLAERRVAFRRELRNILRDFDYVEQIARNQFVACEFKARQGRDRI 

25 

>Rv2509 - putative oxidoreductase TB.seq 2824676:2825479 MW:28014 SEQ ID NO:245 
MPIPAPSPDARAWTGASQNIGAALATELAARGHHLIVTARREDVLTELAARLADKYRVTVDVRPADL 
ADPQERSKLADELAARPISILCAh4AGTATFGPIASLDLAGEKTQVQLNAVAVHDLTLAVLPGMIERKAG 
GILISGSAAGNSPIPYNATYAATKAFVNTFSESLRGELRGSGVHVTVLAPGPVRTELPDASEASLVEKL 
30 VPDFLWISTEHTARVSLNALERNKMRWPGLTSKAMSVASQYAPRAIVAPIVGAFYKRLGGS 

>Rv2524c (as fatty acid synthase TB.seq 2840124:2849330 MW:326226 SEQ ID NO:246 
VTIHEHDRVSADRGGDSPHTTHALVDRLMAGEPYAVAFGGQGSAWLETLEELVSATGIETELATLVG 
EAELLLDP\/TDELIWRPIGFEPLQWVRAI.AAEDPVPSDKHLTSAAVS\/PGVLLTQIAATRALARQGM 
35 DLVATPPVAMAGHSQGVl^VEALKAGGARDVELFALAQUGAAGTLVARRRGISVLGDRPPMVSVTN 
ADPERIGRLLDEFAQDVR7VLPPVLSIRNGRRAWITGTPEQLSRFELYCRQISEKEEADRKNKVRGG 
DVFSPVFEPVQVEVGFHTPRLSDGIDIVAGWAEKAGLDVALARELADAILIRKVDWVDEITRVHAAGA 
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RWILDLGPGDILTRLTAPVIRGLGIGIVPAATRGGQRNLFTVGATPEVARAWSSYAPTWRLPDGRVK 
LSTKFTRLTGRSPIUJ^GMTPTTN^AKIVAAAANAGHWAELAGGGQVTEEIFGNRIEQMAGLLEPGRT 
YQFNALFLDPYLWKLQVGGKRLVQKARQSGAAIDGWISAGIPDLDEAVELIDELGDIGISHWFKPGT 
lEQIRSVIRIATEVPTKPVIMHVEGGRAGGHHSWEDLDDLLLATYSELRSRANITVCVGGGIGTPRRAA 
5 EYLSGRWAQAYGFPLMPIDGILVGTAAMATKESTTSPSVKRMLVDTQGTDQVVISAGKAQGGMASSR 
SQLGADIHEIDNSASRCGRLLDEVAGDAEAVAERRDEIIAAMAKTAKPYFGDVAD^4TYLQWLRRWE 
LAIGEGNSTADTASVGSPWLADTWRDRFEQMLQRAEARLHPQDFGPIQTLFTDAGLLDNPQQAIAAL 
LARYPDAEWQLHPADVPFF\m.CKTLGKPVNFVPVIDQD\^RWWRSDSLWQAHDARYDADAVCIIP 
GTASVAGITRMDEPVGELLDRFEQAAIDEVLGAGVEPKDVASRRLGRADVAGPLAWLDAPDVRWA 

10 GRTVTNPVHRIADPAEWQVHDGPENPRATHSSTGARLQTHGDDVALSVPVSGTWVDIRFTLPANTV 
DGGTPVIATEDATSAMRTVLAIAAGVDSPEFLPAVANGTATLTVDWHPERVADIHTGVrATFGEPLAP 
SLTNVPDALVGPCWPAVFAAIGSAVmDTGEPWEGU-SLVHLpHAARWGQLPTVPAQLTVTATAAN 
ATDTDMGRWPVSWVTGADGAVIATLEERFAILGRTGSAELADPARAGGAVSANATDTPRRRRRDV 
TITAPVDMRPFAWSGDHNPIHTDRAAALLAGLESPIVHGMWLSAAAQHAVTATDGQARPPARLVG 

15 mARFLGMVRPGDEVDFRVERVGIEXaGAEIVDVAARVGSDLVMSASARLAAPKTVYAFPGQGIQHK 
GMGMEVRARSKAARKVWDTADKFTRDTLGFSVLHVVRDNPTSIIASGVHYHHPDGVLYLTQFTQVA 
MATVAAAQVAEMREQGAFVEGAIACGHSVGEYTALACVTGfYQLEALLEMVFHRGSKMHDIVPRDEL 
GRSNYRLAAIRPSQIDLDDADVPAFVAGIAESTGEFLEIVNFNLRGSQYAIAG7VRGLEALEAEVERRR 
ELTGGRRSFILVPGIDVPFHSRVLRVGVAEFRRSLDRVMPRDADPDLIIGRYIPNLVPRLFTLDRDFIQ 

20 EIRDLVPAEPLDEILADYDTWLRERPREMARTVFIELLAWQFASPVRWIETQDLLFIEEAAGGLGVERF 
VEIGVKSSPTVAGLATNTLKLPEYAHS7VEVLNAERDAAVLFATDTDPEPEPEEDEPVAESPAPDWS 
EAAPVAPAASSAGPRPDDLVFDAADATLALIALSAKMRIDQIEELDSIESITDGASSRRNQLLVDLGSE 
LNLGAIDGAAESDLAGLRSQVTKLARTYKPYGPNA-SDAINDQLRTVLGPSGKRPGAIAERVKKTWELG 
EGWAKHVTVEVALGTREGSSVRGGAMGHLHEGALADAASVDKVIDAAVASVAARQGVSVALPSAG 

25 SGGGATIDAAALSEFTDQITGREGVLASAARLVLGQLGLDDPVNALPAAPDSELIDLVTAELGADWPR 
LVAPVFDPKKAWFDDRWASAREDLVKLVVLTDEGDIDADWPRLAERFEGAGHWATQATWWQGKS 
LAAGRQIHASLYGRIAAGAENPEPGRYGGEVAXATTGASKGSIAASWARLLDGGATVIATTSKLDEER 
LAFYRTLYRDHARYGAALWLVAANMASYSDVDALVEWIGTEQTESLGPQSIHIKDAQTPTLLFPFAAP 
RWGDLSEAGSRAEMEMKVLLWAVQRLIGGLSTIGAERDIASRLHWLPGSPNRGMFGGDGAYGEA 

30 KSALDAWSRWHAESSWAARVSLAHALIGWTRGTGLMGHNDAIVAAVEEAGVTTYSTDEMAALLLD 
LCDAESKVAAARSPIKADLTGGLAEANLDMAELAAKAREQMSAAAAVDEDAEAPGAIAALPSPPRGF 
TPAPPPQWDDLDVDPADLWIVGGAEIGPYGSSRTRFEMEVENELSAAGVLELAWTTGLIRWEDDP 
QPGWYDTESGEMVDESELVQRYHDAWQRVGIREFVDDGAIDPDHASPLLVSVFLEKDFAFWSSE 
ADARAFVEFDPEHTVIRPVPDSTDWQVIRKAGTEIRVPRKTKLSRWGGQIPTGFDPTVWGISADMA 

35 GSIDRLAVWNMVATVDAFLSSGFSPAEVMRYVHPSLVANTQGTGMGGGTSMQTMYHGNLLGRNKP 
NDIFQEVLPNIIAAHWQSYVGSYGAMIHPVAACATAAVSVEEGVDKIRLGKAQLWAGGLDDLTLEGII 
GFGDMAATADTSMMCGRGIHDSKFSRPNDRRRLGFVEAQGGGTILLARGDLALRMGLPVLAWAFA 
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QSFGDGVHTSIPAPGLGALGAGRGGKDSPLARALAKLGVAADDVAVISKHDTSTLANDPNETELHER 
LADALGRSEGAPLFWSQKSLTGHAKGGAAVFQMMGLCQILRDGVIPPNRSLDCVDDELAGSAHFV 
WVRDTLRLGGKFPLKAGMLTSLGFGHVSGLVALVHPQAFIASLDPAQRADYQRRADARLLAGQRRL 
ASAIAGGAPMYQRPGDRRFDHHAPERPQEASMLLNPAARLGDGEAYIG 

5 

>Rv2555calaS alanyl-tRN A synthase TB.seq 2873772:2876483 MW:97326 SEQ ID NO:247 
VQTHEIRKRFLDHFVKAGHTEVPSASVILDDPNLLFVNAGMVQFVPFFLGQRTPPYPTATSIQKCIRTP 
DIDEVGITTRHNTFFQMAGNFSFGDYFKRGAIELAWALLTNSLAAGGYGLDPERIWrrVYFDDDEAV 
RLWQEVAGLPAERIQRRGMADNYWSMGIPGPCGPSSEIYYDRGPEFGPAGGPIVSEDRYLEVWNL 

10 VFMQNERGEGTTKEDYQILGPLPRKNIDTGMGVERIALVLQDVHNVYETDLLRPVID7VARVAARAYD 
VGNHEDDVRYRIIADHSRTAAJLIGDGVSPGNDGRGYVLRRLLRRVIRSAKLLGIDAAIVGDLMATVRN 
AMGPSYPELVADFERISRIAVAEETAFNRTLASGSRLFEEVASSTKKSGATVLSGSDAFTLHDTYGFPI 
ELTLEMAAETGLQVDEIGFRELMAEQRRRAKADAAARKHAHADLSAYRELVDAGATEFTGFDELRS 
QARILGIFVDGKRVPWAHGVAGGAGEGQRVELVLDRTPLYAESGGQIADEGTISGTGSSEAARAAV 

1 5 TDVQKIAKTLVVVHRVNVESGEFVEGDTVI AAVDPGWRRGATQGHSGTHMVHAALRQVLGPNAVCaA 
GSLNRPGYLRFDFNWQGPLTDDQRTQVEEVTNEAVQADFEVRTFTEQLDKAKAMGAIALFGESYPD 
EVRWEMGGPFSLELCGGTHVSNTAQIGPVTILGESSIGSGVRRVEAYVGLDSFRHLAKERALMAGL 
ASSLKVPSEEVPARVANLVERLRAAEKELERVRMASARAAATNAAAGAQRIGNVRLVAQRMSGGMT 
AADLRSLIGDIRGKLGSEPAWALIAEGESQTVPYAVAANPAAQDLGIRANDLVKQLAVAVEGRGGGK 

20 ADLAQGSGKNPTGIDAALDAVRSEIAVIARVG 

>Rv2580c hisS histidyl-tRNA synthase TB.seq 2904822:2906090 MW:451 18 SEQ ID NO:248 
VTEFSSFSAPKGVPDYVPPDSAQFVAVRDGLLAAARQAGYSHIELPIFEDTALFARGVGESTDWSKE 
MYTFADRGDRSVTLRPEGTAGWRAVIEHGLDRGALPVKLCYAGPFFRYERPQAGRYRQLQQVGV 
25 EAIGVDDPALDAEVIAIADAGFRSLGLDGFRLEITSLGDESCRPQYRELLQEFLFGLDLDEDTRRRAGI 
NPLRVLDDKRPELRAMTASAPVLLDHLSDVAKQHFD7VLAHLDALGVPYVINPRMVRGLDYYTKTAF 
EFVHDGLGAQSGIGGGGRYDGLMHQLGGQDLSGIGFGLGVDRTVLALRAEGKTAGDSARCDVFGV 
PLGEAAKLRLAVLAGRLRAAGVRVDLAYGDRGLKGAMRAAARSGARVALVAGDRDIEAGTVAVKDL 
TTGEQVSVSMDSWAEVISRLAG 

30 

>Rv2614c thrS threonyl-tRNA synthase TB.seq 2941 1 90:2943265 MW:771 23 SEQ ID NO:249 
MSAPAQPAPGVDGGDPSQARIRVPAGTTAATAVGEAGLPRRGTPDAIWVRDADGNLRDLSWVPD 
VDTDITPVAANTDDGRSVIRHSTAHVLAQAVQELFPQAKLGIGPPITDGFYYDFDVPEPFTPEDLAALE 
KRMRQIVKEGQLFDRRVYESTEQARAELANEPYKLELVDDKSGDAEIMEVGGDELTAYDNLNPRTR 
35 ERVWGDLCRGPHIPTTKHIPAFKLTRSSAAYWRGDQKNASLQRIYGTAWESQEALDRHLEFIEEAQR 
RDHRKLGVELDLFSFPDEIGSGLAVFHPKGGIVRRELEDYSRRKHTEAGYQFVNSPHITKAQLFHTSG 
HLDWYADGMFPPMHIDAEYNADGSLRKPGQDYYLKPMNCPMHCLIFRARGRSYRELPLRLFEFGTV 
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YRYEKSGWHGLTRVRGLTMDDAHIFCTRDQMRDELRSLLRFVLDLLADYGLTDFYLELSTKDPEKF 
VGAEEVVVEEATTVLAEVGAESGLELVPDPGGAAFYGPKISVQVKDALGRTWQMSTIQLDFNFPERF 
GLEYTAADGTRHRPVMIHRALFGSIERFFGILTEHYAGAFPAWLAPVQWGIPVADEHVAYLEEVATQ 
LKSHGNmAEVDASDDRMAKKIVHHTNHKVPFMVLAGDRDVAAGAVSFRFGDRTQINGVARDDAVAA 
5 IVAWIADRENAVPTAELVKVAGRE 

>Rv2697c dut deoxyuridine triphosphatase TB.seq 3013683:3014144 MW:15772 SEQ ID NO:250 
VSTTLAIVRLDPGLPLPSRAHDGDAGVDLYSAEDVELAPGRRALVRTGVAVAVPFGMVGLVHPRSGL 
ATRVGLSIVNSPGTIDAGYRGEIKVALINLDPAAPIWHRGDRIAQLLVQRVELVELVEVSSFDEAGLAS 
10 TSRGDGGHGSSGGHASL 

>Rv2782c pepR protease/pepb'dase, M16 family (insulinase) TB.seq 3089045:3090358 MW:47074 
SEQiDNO:251 

MPRRSPADPAAALAPRRTTLPGGLRWTEFLPAVHSASVGVWVGVGSRDEGATVAGAAHFLEHLLF 
1 5 KSTPTRSAVDI AQAMDAVGGELIWTAKEHTCYYAHVLGSDLPLAVDLVADWLNGRCAADDVEVER 
PWLEEIAMRDDDPEDALADMFLAALFGDHPVGRPVIGSAQSVSVMTRAQLQSFHLRRYTPERMW 
AAAGNVDHDGLVALVREHFGSRLVRGRRPVAPRKGTGRVNGSPRLTLVSRDAEQTHVSLGIRTPGR 
GWEHRWALSVLHTALGGGLSSRLFQEVRETRGLAYSVYSALDLFADSGALSVYAACLPERFADVMR 
VTADVLESVARDGITEAECGIAKGSLRGGLVLGLEDSSSRMSRLGRSELNYGKHRSIEHTLRQIEQVT 
20 VEEVNAVARHLLSRRYGAAVLGPHGSKRSLPQQLRAMVG 

>Rv2783c gpsi pppGpp synthase and polyribonucleotide phosphorylase TB.seq 
3090339:3092594 MW:79736 SEQ ID NO:252 

MSAAEIDEGVFETTATIDNGSFGTRTIRFETGRLALQAAGAWAYLDDDNMLLSATTASKNPKEHFDF 
FPLTVDVEERMYAAGRIPGSFFRREGRPSTDAILTCRLIDRPLRPSFVDGLRNEIQIWTILSLDPGDLY 

25 DVLAINAASASTQLGGLPFSGPIGGVRVALIDGTWVGFPTVDQIERAVFDMWAGRIVEGDVAIMMVE 
AEATENWELVEGGAQAPTESWAAGLEAAKPFIAALCTAQQELADAAGKSGKPTVDFPVFPDYGED 
VYYSVSSVATDELAAALTIGGKAERDQRIDEIKTQWQRLADTYEGREKEVGAALRALTKKLVRQRILT 
DHFRIDGRGITDIRALSAEVAWPRAHGSALFERGETQILGVTTLDMIKMAQQIDSLGPETSKRYMHH 
YNFPPFSTGETGRVGSPKRREIGHGALAERALVPVLPSVEEFPYAIRQVSEALGSNGSTSMGSVCAS 

30 TLALLNAGVPLKAPVAGIAMGLVSDDIQVEGAVDGWERRFVTLTDILGAEDAFGDMDFKVAGTKDFV 
TALQLDTKLDGIPSQV^GALEQAKDARLTILEVMAEAIDRPDEMSPYAPRVTTIKVPVDKIGEVIGPK 
GKVINAITEETGAQISIEDDGTVFVGATDGPSAQAAIDKINAIANPQLPTVGERFLGTVVKTTDFGAFVS 
LLPGRDGLVHISKLGKGKRIAKVEDWNVGDKLRVEIADIDKRGKISLILVADEDSTAAATDAATVTS 
>Rv27g3c truB tRNA pseudouridine 55 synthase TB.seq 3102364:3103257 MW:31821 

35 SEQ ID NO:253 

MSATGPGIWIDKPAGMTSHDWGRCRRIFATRRVGHAGTLDPMATGVLVIGIERATklLGLLTAAPKS 
YAATIRLGQTTSTEDAEGQVLQSVPAKHLTIEAIDAAMERLRGEIRQVPSSVSAIKVGGRRAYRLARQ 
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GRSVQLEARPIRIDRFELLAARRRDQLIDIDVEIDCSSGTYIRALARDLGDALGVGGHVTALRRTRVGR 
FELDQARSLDDLAERPALSLSLDEACLLMFARRDLTAAEASAAANGRSLPAVGIDGVYAACDADGRVl 
ALLRDEGSRTRSVAVLRPATMH PG 

5 >Rv2797c - TB.seq 3105619:3107304 MW:58761 SEQ ID NO:254 

VPL7VADIDRWNAQAVREVFHAASARAEVTFEASRQLAALSIFANSGGKTAEAAAHHNAGIRRDLDA 
HGNEALAVARAADRAADGIVKVQSELAALRHAAAAAELTIDALINRWPIPGLRSTEAQWARTLAKQT 
ELQAELDAIMAEANAVDEELASAVNMADGDAPIPADSGPPVGPEGLTPTQLASDANEERLREERARL 
QAHLERLQAEYDQLSVRAARDYHNGILDGDAVGRLAALTDELSAARGRLGELDAVDEALSRAPETYL 
10 TQLQIPEDPNQQVLAAVAVGNPDTAANVSVTVPGVGSTTRGALPGMVTEARDLRSEVIRQLNAAGK 
PASVAT1AWMGYHPPPNPLDTGSAGDLWQTMTDGQAHAGAADLSRYLQQVRANNPSGHL7VLGHS 
YGSLTASLALQDLDAQSAHPVNDWFYGSPGLELYSPAQLGLDHGHAYVMQAPHDLITNLVAPLAPL 
HGWGLDPYLTPGFTELSSQAGFDPGGIWRDGVYAHGDYPRSFLDAAGQPQLRMSGYNLAAIAAGL 
PDN7VGPPLLPPILGGGMPAAPGPALRGGR 

15 

>Rv2864c ponA2 TB.seq 3175454:3177262 MW:6301 5 SEQ ID NO:255 

MVTKTTLASATSGLLLLAWAMSGCTPRPQGPGPAAEKFFAALAIGDTASAAQLSDNPNEAREALNA 
AWAGLQAAHLDAQVLSAKYAEDTGWAYRFSWHLPKDRIWTYDGQLKMARDEGRVVHVRWTTSGL 
HPKLGEHQTFALRADPPRRASVNEVGGTDVLVPGYLYHYSLDAGQAGRELFGTAHAWGALHPFDD 

20 TLNDPQLLAEQASSSTQPLDLVTLHADDSNRVAAAIGQLPGWITPQAELLPTDKHFAPAVLNDVKKA 
\A^ELDGKAGWRWSVNQNGVDVS\n.HEVAPSPASSVSITLDRWQNAAQHAVNTRGGKAMIWIK 
PSTGEILAIAQNAGADADGPVATTGLYPPGSTFKMITAGAAVERDLATPETLLGCPGEIDIGHRTIPNY 
GGFDLGWPMSRAFASSCNTTFAELSSRLPPRGLTQAARRYGIGLDYQVDGITTVTGSVPPTVDLAE 
RTEDGFGQGKVAJ^SPFGMALVAATVAAGKTPVPQLIAGRPTAVEGDATPISQKMIDALRPMMRLWT 

25 NGTAKEIAGCGEVFGKTGEAEFPGGSHSWFAGYRGDLAFASLIVGGGSSEYAVRMTKV/MFESLPPG 
YLA 

>Rv2868c gcpE TB.seq 3179368:3180528 MW:40451 SEQ ID NO:256 

VTVGLGMPQPPAPTLAPRRATRQLMVGNVGVGSDHPVSVQSMCTTKTHDVNSTLQQIAELTAAGC 
DIVRVACPRQEDADALAEjARHSQIPWADIHFQPRYIFAAIDAGCAAVRVNPGNIKEFDGRVGEVAKA 

30 AGAAGIPIRIGVNAGSLDKRFMEKYGKATPEALVESALWEASLFEEHGFGDIKISVKHNDPWMVAAY 
ELLAARCDYPLHLGVTEAGPAFQGTIKSAVAFGALLSRGIGDTIRVSLSAPPVEEVKVGNQVLESLNL 
RPRSLEIVSCPSCGRAQVDVYTLANEVTAGLDGLDVPLRVAVMGCWNGPGEAREADLGVASGNGK 
GQIFVRGEVIKTVPEAQIVETLIEEAMRLAAEMGEQDPGATPSGSPIVTV/S 
>Rv2869c-TB.seq 3180548:3181759 MW:42835 SEQ ID NO:257 

35 MMFVn^GIVLFALAILISVALHECGHMWVARRTGMKVRRYFVGFGPTLWSTRRGETEYGVKAVPLGG 
FCDIAGMTPVEELDPDERDRAMYKQATWKRVAVLFAGPGMNLAICLVLIYAIALVWGLPNLHPPTRAV 
IGETGCVAQEVSQGKLEQCTGPGPAALAGIRSGDVWKVGDTPVSSFDEMAAAVRKSHGSVPIWE 
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RDGTAIVTYVDIESTQRWIPNGQGGELQPATVGAJGVGAARVGPVRYGVFSAMPATFAVTGDLTVEV 
GKALAALPTKVGAL\mAIGGGQRDPQTPISWGASIIGGDTVDHGLVVVAFVVFFl_AQLNLILAAINLLPL 
LPFDGGHIAVAVFERIRNMVRSARGKVAAAPVNYLKLLPATYWLVLWGYMLLTVTADLVNP^ 
>Rv2870c - TB.seq 3181770:3183077 MW:45324 SEQ ID NO:258 
5 VATGGRWl RRRGDNEWAHNDEVTNSTDGRADGRLRNA/VLGSTGSIGTQALQVI ADNPDRFEWG 
LAAGGAHLDTLLRQRAQTGVTNIAVADEHAAQRVGDIPYHGSDAATRLVEQTEADWLNALVGALGL 
RPTLAALKTGARLALANKESLVAGGSLVLRAARPGQIVPVDSEHSALAQCLRGGTPDEVAKLVLTAS 
GGPFRGWSAADLEHVTPEQAGAHPTWSMGPMNTLNSASLVNKGLEV1ETHLLFGIPYDRIDWVHP 
QSIIHSMVTFIDGSTIAQASPPDMKLPISLALGWPRRVSGAAAACDFHTASSWEFEPLDTDVFPAVEL 
10 ARQAGVAGGCMTAVYNAANEEAAAAFLAGRIGFPAIVGIIADVLHAADQWAVEPATVDDVLDAQRWA 
RERAQRAVSGMASVAIASTAKPGAAGRHASTLERS 

>Rv2922c snnc member of Smc1/Cut3/Cut14 femily TB.seq 3234189:3238055 MW:139610 
SEQ ID NO:259 

VGAGSRFPLVDPLPSVGARPDRLRGQPRRRTRAGGRPGSARCVPEAAAAAAGRHDTGPRRQSRR 

1 5 RLVAVDGADHRVQRAVIWPLNmKSLTLKGFKSFAAPTTLRFEPGITAWGPNGSGKSNWDALAWV 
MGEQCAKTLRGGKMEDVIFAGTSSRAPLGRAEVrVSIDNSDNALPIEVTEVSITRRMFRDGASEYEIN 
GSSCRLMDVQELLSDSGIGREMHVIVGQGKLEEILQSRPEDRRAFIEEAAGVLXHRKRKEKALRKLDT 
MAANUVRLTDLTTELRRQLKPLGRQAEAAQRAAAIQADLRDARLRLAADDLVSRRAEREAVFCaAEAA 
MRREHDEAAARLAVASEELAAHESAVAELSTRAESIQHTWFGLSALAERVDATVRIASERAHHLDIEP 

20 VAVSDTDPRKPEELEAEAQQVAVAEQQLLAELDAARARLDAARAELADRERRAAEADRAHLAAVRE 
EADRREGLARLAGQVETMRARVESIDESVARLSERIEDAAMRAQQTRAEFETVQGRIGELDQGEVG 
LbEHHERTVAALRLADERVAELQSAERAAERQVASLRARIDALAVGLQRKDGAAVVLAHNRSGAGLF 
GSIAQLVKVRSGYEAALAAALGPAADALAVDGLTAAGSAVSALKQADGGRAVLVLSDWPAPQAPQS 
ASGEMLPSGAQWALDLVESPPQLVGAMIAMLSGVAWNDLTEAMGLVEIRPELRAVTVDGDLVGAG 

25 WVSGGSDRKLSTLEVnrSEIDKARSELAAAEALAAQLNAAU^GALTEQSARQDAAEQAUVALNESDTAI 
SAMYEQLGRLGQEARAAEEEWNRLLQQRTEQEAVRTQTLDDVIQLETQLRKAQETQRVQVAQPIDR 
QAISAAADRARGVEVEARLAVRTAEERANAVRGRADSLRRAAAAEREARVRAQQARAARLHAAAVA 
AAVADCGRLLAGRLHRAVDGASQLRDASAAQRQQRLAAMAAVRDEVNTLSARVGELTDSLHRDEL 
ANAQAALRIEQLEQMVLEQFGMAPADLITEYGPHVALPPTELEMAEFEQARERGEQVIAPAPMPFDR 

30 VTQERRAKRAERALAELGRVNPLALEEFAALEERYNFLSTQLEDVKAARKDLLGWADVDARILQVFN 
DAFVDVEREFRGVFTALFPGGEGRLRLTEPDDMLTTGIEVEARPPGKKITRLSLLSGGEKALTAVAML 
VAIFRARPSPFYIMDEVEAALDDVNLRRLLSLFEQLREQSQIIIITHQKPTMEVADALYGVTMQNDGITA 
VISQRMRGQQVDQLVTNSS 

>Rv2925c mc RNAse III TB.seq 3239829:3240548 MW:25400 SEQ ID NO:260 
35 MIRSRQPLLDALGVDLPDELLSLALTHRSYAYENGGLPTNERLEFLGDAVLGLTITDALFHRHPDRSE 
GDLAKLRASWNTQALADVARRLCAEGLGVHVLLGRGEANTGGADKSSILADGMESLLGAIYLQHGM 
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EKAREVILRLFGPLLDAAPTLGAGLDWKTSLQELTAARGLGAPSYLVTSTGPDHDKEFTAWWMDS 
EYGSGVGRSKKEAEQKAAAAAWKALEVLDNAMPGKTSA 

>Rv2934 ppsD TB.seq 3262245:3267725 MW:193317 SEQ ID NO:261 
5 lvrrSU^RAAQLSPNARAAU\REL\mAGTTFPTDICEPVAWGIGCRFPGNVrGPESFWQLLADGVDT 
lEQVPPDRWDADAFYDPDPSASGRMTTKWGGFVSDVDAFDADFFGITPREAVAMDPQHRMLLEVA 
WEALEHAGIPPDSLSGTRTGVMMGLSSWDYTIVNIERRADIDAYLSTGTPHCAAVGRIAYLLGLRGPA 
VAVDTACSSSLVAIHLACQSLRLRETDVALAGGVQLTLSPFTAIALSKWSALSPTGRCNSFDANADGF 
VRGEGCGWVLKRLADAVRDQDRVI^WRGSATNSDGRSNGMTAPNA1^QRDVITSAIJ<LAD\/TPD 

10 SVNYVETHGTGTVLGDPIEFESLAATYGLGKGQGESPCALGSVKTNIGHLEAAAGVAGFIKAVLAVQR 
GHIPRNLHFTRWNPAIDASATRLFVPTESAPWPAAAGPRRAAVSSFGLSGTNAHWVEQAFDTAVAA 
AGGMPYVSyVLNVSGKTAARVASAAAVLADWMSGPGAAAPLADVAHTLNRHRARHAKFATVIARDRA 
EAIAGLRALAAGQPRVGWDCDQHAGGPGRVFVYSGQGSQWASMGQQLLANEPAFAKAVAELDPI 
FVDQVGFSLCMTLIDGDEWGIDRIQPVLVGMQLALTELWRSYGVIPDAVIGHSMGEVSAAWAGALT 

1 5 PEQGLRVITTRSRLMARLSGQGAMALLELDADAAEALI AGYPQVTLAVHASPRQTVI AGPPEQVDT^ 
AAVATQNRU^RVEVDVASHHPIIDPILPELRSALADLTPQPPSIPIISTTYESAQPVADADYWSANLRN 
PVRFHQAVTAAGVDHNTFIEISPHPVLTHALTDTLDPDGSHTVMSTMNRELDQTLYFHAQLAAVGVA 
ASEHTTGRLVDLPPTPWHHQRFWVTDRSAMSELAATHPLLGAHIEMPRNGDHVWQTDVGTEVCPW 
UMDHKVFGQPIMPAAGFAEIALAAASEALGTAADAVAPNIVINQFEVEQMLPLDGHTPLTTQLtRGGDS 

20 QIRVEIYSRTRGGEFCRHATAKVEQSPRECAHAHPEAQGPATGTTVSPADFYALLRQTGQHHGPAF 
AALSRIVRLADGSAETEISIPDEAPRHPGYRLHPWLDAALQSVGAAIPDGEIAGSAEASYLPVSFETIR 
VYRDIGRHVRCRAHLTNLDGGTGKMGRIVLINDAGHIAAEVDGIYLRRVERRAVPLPLEQKIFDAEWT 
ESPIAAVPAPEPAAETTRGSWLVLADATVDAPGKAQAKSMADDFVQQVVRSPMRRVHTADIHDESAV 
LAAFAETAGDPEHPPVGVWFVGGASSRLDDEUW^RDTVWSITTWRAWGTWHGRSPRLWLVTG 

25 GGLSVADDEPGTPAAASLKGLVRVLAFEHPDMRTTLVDLDITQDPLTALSAELRNAGSGSRHDDVIA 
WRGERRFVERLSRATIDVSKGHPWRQGASYWTGGLGGLGLWARVyA.VDRGAGRWLGGRSDPT 
DEQCr4VLAELQTRAEIWVRGDVASPGVAEKLIETARQSGGQLRGWHAAAVIEDSLVFSMSRDNLE 
R\/WAPKATGALRMHEATADCELDWV\rt.GFSSAASLLGSPGQAAYACASAWLDALVGWRRASGLPA 
AVINWGPWSEVGVAQALVGSVLDTISVAEGIEALDSLLAADRIRTGVARLRADRALVAFPEIRSISYFT 

30 QWEELDSAGDLGDWGGPDALADLDPGEARRAVTERMCARIAAVMGYTDQSTVEPAVPLDKPLTEL 
GLDSLMAVRIRNGARADFGVEPPVALILQGASLHDLTADLMRQLGLNDPDPALNNADTIRDRARQRA 
AARHGAAMRRRPKPEVQGG 

>Rv2946c pks1 TB.seq 3291503:3296350 MW:166642 SEQ ID NO:262 
35 VISARSAEALTAQAGRLMAHVQANPGLDPIDVGCSLASRSVFEHRAVWGASREQLIAGLAGLAAGE 
PGAGVAVGQPGSVGKTVWFPGQGAQRIGMGRELYGELPVFAQAFDAVADELDRHLRLPLRDVIW 
GADADLLDSTEFAQPALFAVEVASFAVLRDWGVLPDFVMGHSVGELAAAHAAGVLTLADAAMLWA 
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RGRLMQALPAGGAMVAVAASEDEVEPLLGEGVGIAAINAPESWISGAQAAANAIADRFAAQGRRVH 
QLAVSHAFHSPUVIEPMLEEFARVAARVQAREPQLGLVSNVTGELAGPDFGSAQYVVVDHVRRPVRF 
ADSARHLQTLGATHFIEAGPGSGLTGSIEQSLAPAEAMWSMLGKDRPELASALGAAGQVFTTGVPV 
QWSAVFAGSGGRRVQLPTYAFQRRRFWETPGADGPADAAGLGLGATEHALLGANA/ERPDSDEWL 
5 TGRLSI^QPWLADHWNGWLFPGAGFVELVIRAGDEVGCALIEELVLAAPLVMHPGVGVQVQWV 
GAADESGHRAVSVYSRGDQSQGWLLNAEGMLGVAAAETPMDLSVWPPEGAESVDISDGYAQLAE 
RGYAYGPAFQGLVAIWRRGSELFAEWAPGEAGVAVDRMGMHPAVLDAVLHALGLAVEKTQASTET 
RLPFCWRGVSLHAGGAGRVRARFASAGADAISVDVCDATGLPVL7VRSLVTRPITAEQLRAAV7AAG 
GASDQGPLEWWSPISWSGGANGSAPPAPVSWADFCAGSDGDASVWWELESAGGQASSWGS 

10 WAATHTALEVLQSWLGADRAATLWLTHGGVGLAGEDISDU^AAAVWGMARSAQAENPGRIVLIDT 
DAAVDASVLAGVGEPQLLVRGGTVHAPRLSPAPALLALPAAESAWRLAAGGGGTLEDLVIQPCPEV 
QAPLQAGQVRVAVAAVGVNFRDWAALGMYPGQAPPLGAEGAGWLETGPEVTDLAVGDAVMGFL 
GGAGPLAWDQQLVn-RVPQGWSFAQAAAVPVVFLTAWYGLADLAEIKAGESVLIHAGTGGVGMAAV 
QLARQWGVEVF\n"ASRGKWDTLRAMGFDDDHIGDSRTCEFEEKFLAVTEGRGVDWLDSI_AGEFV 

15 DASLRLLVRGGRFLEMGKTDIRDAQEIAANYPGVQYRAFDLSEAGPARMQEMLAEVRELFDTRELH 
RLPVmWDVRCAPAAFRFMSCW^HIGKWLTMPSAIJkDRLADGTWITGATGAVGGVLARHLVGAY 
GVRHLVLASRRGDRAEGAAELAADLTEAGAKVQWACDVADRAAVAGLFAQLSREYPPVRGVIHAA 
GVLDDAVITSLTPDRIDTVLRAKVDAAWNLHQATSDLDLSMFALCSSIAA7VGSPGQGNYSAANAFLD 
GLA^HRQAAGIJ^GISUVWGLWEQPGGIV^■AHLSSRDLARMSRSGiJ^PMSPAEAVELFDAAL^ 

20 AVATLLDRAALDARAQAGALPALFSGLARRPRRRQIDDTGDATSSKSALAQRLHGLAADEQLELLVG 
LVCLCJAAAVLGRPSAEDVDPDTEFGDLGFDSLTAVELRNRLKTATGLTLPPTVIFDHPTPTAVAEYVA 
QQMSGSRPTESGDPTSQWEPAAAEVSVHA 

>Rv3014c ligA ONA ligase TB.seq 3372545:3374617 MW:75258 SEQ ID NO:263 
VSSPDADQTAPEVLRQWQALAEEVREHQFRYYVRDAPIISDAEFDELLRRLEALEEQHPELRTPDSP 

25 TQLVGGAGFATDFEPVDHLERMLSLDNAFTADELAAWAGRIHAEVGDAAHYLCELXIDGVALSLVYR 
EGRLTRASTRGDGRTGEDVTLNARTIADVPERLTPGDDYPVPEVLEVRGEVFFRLDDFCaALNASLVE 
EGKAPFANPRNSAAGSLRQKDPAVTARRRLRMICHGLGHVEGFRPATLHQAYLALRAWGLPVSEHT 
TLATDLAGVRERIDYWGEHRHEVDHEIDGVWKVDEVALQRRLGSTSRAPRWAIAYKYPPEEAQTKL 
LDIRVNVGRTGRITPFAFMTPVKVAGSTVGQATLHNASEIKRKGVLIGDTWIRKAGDVIPEVLGPWE 

30 LRDGSEREFIMPTTCPECGSPLAPEKEGDADIRCPNARGCPGQLRERVFHVASRNGLDIEVLGYEAG 
VAIXQAKVIADEGELFALTERDLLRTDLFRTKAGELSANGKRLLVNLDKAKAAPLVVRVLVALSIRHVGP 
TAARAIJ^TEFGSLDAIAAASTDQb^AVEGVGPTIAAAVTEWFAVDWHREIVDKWRAAGVRMyDERD 
ESyPRTLAGLTIWTGSLTGFSRDDAKE>MVARGGKAAGSVSKKTNYWAGDSPGSKYDKAVELGVPI 
LDEDGFRRLLWDGPASRT 

35 >Rv3025c - NifS-like protein lB.seq 3383885:3385063 MW:40948 SEQ ID NO:264 

MAYLDHAATTPMHPAAIEAMAAVQRTIGNASSLHTSGRSARRRIEEARELIADKLGARPSEVIFTAGG 
TESDNU^VKGIYWARRDAEPHRRRIVTTEVEHHAVLDSVNWLVEHEGAHVTWLPTAADGSVSATAL 
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REALQSHDDVALVSVMWANNEVGTILPIAEMSWAMEFGVPMHSDAIQAVGQLPLDFGASGLSAMS 
VAGHKFGGPPGVG/U.LLRRDVTCVPLMHGGGQERDIRSGTPDVASAVGMATAAQIAVDGLEENSAR 
LRLLRDRLVEGVLAEIDDVCLNGADDPMRLAGNAHFTFRGCEGDALLMLLDANGIECSTGSACTAGV 
AQPSHVLIAMGVDAASARGSLRLSLGHTSVEADVDAALEVLPGAVARARRAALAAAGASR 

5 

>Rv3080c pknK serine-threonine protein kinase TB.seq 3442656:3445985 MW:1 1 9420 
SEQ ID NO:265 

MTDVDPHATRRDLVPNIPAELLEAGFDNVEEIGRGGFGWYRCVQPSLDRAVAVKVLSTDLDRDNLE 
RFLREQRAIVIGRLSGHPHIVTVLQVGVIJ^GGRPFIVMPYHAKNSLETL1RRHGPLDWRETLSIGVK1_A 

10 GALEAAHRVGTLHRDVKPGNILLTDYGEPQLTDFGIARIAGGFETATGVIAGSPAFTAPEVLEGASPTP 
ASDVYSLGATLFCALTGHAAYERRSGERVIAQFLRITSQPIPDLRKQGLPADVAAAIERAMARHPADR 
PATAADVGEELRDVQRRNGVSVDEMPLPVELGVERRRSPEAHAAHRHTGGGTPWPTPPTPATKY 
RPSVPTGSLVTRSRLTDILRAGGRRRLILIHAPSGFGKSTLAAQWREELSRDGAAVAWLTIDNDDNNE 
VWFLSHLLESIRRVRPTLAESLGHVLEEHGDDAGRYVLTSLIDEIHENDDRIAWIDDWHRVSDSRTQ 

15 AALGFLLDNGCHHLQLIVTSWSRAGLPVGRLRIGDELAEIDSAALRFDTDEAAALLNDAGGLF^PRAD 
VQALTTSTDGWAAALRLAALSLRGGGDATQLLRGLSGASDVIHEFLSENVLDTLEPELREFU.VASVT 
ERTCGGLASALAGITNGRAMLEEAEHRGLFLQRTEDDPNWFRFHQMFADFLHRRLERGGSHRVAEL 
HRRASAWFAENGYLHEAVDHAlJKAGDPARAVDLVEQDETNLPEQSK^rrTIJJ^IVQKLPTSMWSRA 
RLQLAIAWANILLQRPAPATGALNRFETALGRAELPEATQADLRAEADVLRAVAEVFADRVERVDDLL 

20 AEAMSRPDTLPPRVPGTAGNTAALAAICRFEFAEVYPLLDWAAPYQEMMGPFGTVYAQCLRGMAAR 
NRLDIVAALQNFRTAFEVGTAVGAHSHAARLAGSLLAELLYETGDUVGAGRLMDESYLLGSEGGAVD 
YLAARYVIGARVKAAQGDHEGAADRLSTGGDTAVQLGLPRLAARINNERIRLGIALPAAVAADLl_APR 
Tl PRDNGI ATMTAELDEDSAVRLLSAGDSADRDQACQRAGALAAAI DGTRRPLAALQAQILHIETLAAT 
GRESDARNELAPVATKCAELGLSRLLVDAGUV 

25 >Rv3106 fprA adrenodoxin and NADPH ferredoxin reductase TB.seq 3474004:3475371 
MW:49342 SEQ ID NO:266 

MRPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIKSISKQFE 
KTAEDPRFRFFGNVWGEHVQPGELSERYDAVIYAVGAQSDRMLNIPGEDLPGSIAAVDFVGWYNA 
HPHFEQVSPDLSGARAWIGNGNVALDVARILLTDPDVLARTDIADHALESLRPRGIQEWIVGRRGPL 

30 QAAFTTLELRELADLDGVDVVIDPAELDGITDEDAAAVGKVCKQNIKVLRGYADREPRPGHRRMVFR 
FLTSPIEIKGKRKVERIVLGRNELVSDGSGRVAAKDTGEREELPAQLWRSVGYRGVPTPGLPFDDQ 
SGTIPNVGGRINGSPNEYWGWIKRGPTGVIGTNKKDAQDTVDTLIKNLGNAKEGAECKSFPEDHAD 
QVADWLAARQPKLVTSAHWQVIDAFERAAGEPHGRPRVKLASLAELLRIGLG 
>Rv3235 -TB.seq 3611296:3611934 MW:22659 SEQ ID NO:267 

35 MMASNQTAAQHSSATLQQAPRSIDDAGGCPLTISPIANSPGDTFAVTPWEYEPPPRNIPPCGQSSH 
AARRPHTPQLARRQPI RPSGRAPAAVTSTAKSPRLRQAGTFADAALRRVLEVi DRRRPVGQLRPLLA 

186 



BNSDCX;iC>: <WO 013S3t7A1 I > 



wo 01/35317 PCT/USOO/31152 

PGLVDSVLAVSRTAAGHQQGAAMLRRIRLTPAGPDTADTAAEVFGTYSRGDRIHAIACRVEQRPAGN 
ETRWLMVALHIG 

>Rv3255c manA mannose-6-phosphate isomerase TB.seq 3635040:3636263 MW:43340 
SEQ ID NO:268 

5 VELLRGALRTYAWGSRTAIAEFTGRPVPAAHPEAELWFGAHPGDPAWLQTPHGQTSLLEALVADPE 
GQLGSASRARFGDVLPFLVKVLAADEPLSLQAHPSAEQAVEGYLREERMGIPVSSPVRNYRDTSHK 
PELLVALQPFEALAGFREAARTTELLRALAVSDLDPFIDLLSEGSDADGLRALFTTWITAPQPDIDVLV 
PAVLDGAIQYVSSGATEFGAEAK7VLELGERYPGDAGVLAALLLNRISLAPGEAIFLPAGNLHAYVRG 
FGVEVMANSDNVLRGGLTPKHVDVPELLRVLDFAPTPKARLRPPIRREGLGLVFETPTDEFAATLLVL 
10 DGDHLGHEVDASSGHDGPQILLCTEGSATVHGKCGSLTLQRGTAAWVAADDGPIRLTAGQPAKLFR 
ATVGL 

>Rv3264c rmlA2 glucose-1 -phosphate thymidyltransferase TB.seq 3644897:3645973 MW:37840 
SEQ ID NO:269 

LATHQVDAWLVGGKGTRLRPLTLSAPKPMLPTAGLPFLTHLLSRIAAAGIEHVILGTSYKPAVFEAEF 
15 GDGSALGLQIEYVTEEHPLGTGGGIANVAGKLRNDTAMVFNGDVLSGADLAQLLDFHRSNRADVTL 
QLVRVGDPRAFGCVPTDEEDRWAFLEKTEDPPTDQINAGCYVFERNVIDRIPQGREVSVEREVFPA 
LLADGDCKIYGYVDASYWRDMGTPEDFVRGSADLVRGIAPSPALRGHRGEQLVHDGAAVSPGALU 
GGTWGRGAEIGPGTRLDGAVIFDGVRVEAGCVIERSIIGFGARIGPRALIRDGVIGDGADIGARCELL 
SGARVWPGVFLPDGGI RYSSDV 

20 

>Rv3368c - TB.seq 3780334:3780975 MW:23734 SEQ ID NO:270 

MTLNLSVDEVLTTTRSVRKRLDFDKPVPRDVUVIECLELALQAPTGSNSQGWQWVFVEDAAKKKAIA 
DVYLANARGYLSGPAPEYPDGDTRGERMGRVRDSATYLAEHMHRAPVLLIPCLKGREDESAVGGVS 
FWASLFPANWSFCL^RSRGLGSCWTTLHLLDNGEHKVADVLGIPYDEYSQGGLLPIAYTQGIDFRP 
25 AKRLPAESVTHWNGW 

>Rv3382c lytBI TB.seq 3796447:3797433 MW:34667 SEQ ID NO:271 

MAEVFVGPVAQGYASGEVTVLLASPRSFCAGVERAIETVKRVLDVAEGPVYVRKQIVHNTWVAELR 
DRGAVFVEDLDEtPDPPPPGAWVFSAHGVSPAVRAGADERGLQWDATCPLVAKVHAEAARFAAR 
30 GDTWFIGHAGHEETEGTLGVAPRSTLLVQTPADVAALNLPEGTQLSYLTQ7TLALDETADVIDALRA 
RFPTLGQPPSEDICYATTNRQRALQSMVGECDWLVIGSCNSSNSRRLVELAQRSGTPAYLIDGPDDI 
EPEWLSSVSTIGVTAGASAPPRLVGQVIDALRGYASITWERSIATETVRFGLPKQVRAQ 

>Rv3418c groES 10 kD chaperone TB.seq 3836985:3837284 MW:10773 SEQ ID NO:272 
35 VAKVNIKPLEDKILVQANEAETTTASGLVIPDTAKEKPQEGTWAVGPGRWDEDGEKRIPLDVAEGDT 
VIYSKYGGTEIKYNGEEYLILSARDVLAWSK 
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>Rv3423c air TB.seq 3840193:3841416 MW:43357 SEQ ID NO:273 

VKRFWENVGKPNDTTDGRGTTSU\MTPISQTPGLIj\EAiyWDLGAIEHNVR\(a.REHAGHAQLMAWK 
ADGYGHGATRVAQTALGAGAAELGVATVDEALALRADGITAPVLAWLHPPGIDFGPALLADVQVAVS 
SLRQLDELLHAVRRTGRTATVTVKVDTGLNRNGVGPAQFPAMLTALRQAMAEDAVRLRGLMSHMV 
5 YADKPDDSI NDVQAQRFTAFLAQAREQGVRFEVAHLSNSSATMARPDLTFDLVRPGI AVYGLSPVPA 
LGDMGLVPAMTVKCAVALVKSIRAGEGVSYGHTWIAPRDTNLALLPIGYADGVFRSLGGRLEVLINGR 
RCPGVGRICMDQFMVDLGPGPLDVAEGDEAILFGPGIRGEPTAQDWADLVGTIHYEWTSPRGRITR 
TYREAENR 

10 >Rv34gO otsA [alpha],-trehalose-phosphate synthase TB.seq 3908232:3909731 MW:55864 
SEQ ID NO:274 

MAPSGGQEAQICDSETFGDSDFVWANRLPVDLERLPDGSTTWKRSPGGLVTALEPVLRRRRGAW 
VGVVPGVNDDGAEPDLHVLDGPIIQDELELHPVRLSTTDIAQYYEGFSNATLWPLYHDVIVKPLYHRE 
VVWDRYVDVNQRFAEAASRAAAHGATVVVVQDYQLQLVPKMLRMLRPDLTIGFFLHIPFPPVELFMQ 
15 MPWRTEIIQGLLGADLVGFHLPGGAQNFLILSRRLVGTDTSRGTVGVRSRFGAAVLGSRTIRVGAFPI 
SVDSGALDHAARDRNIRRRAREIRTELGNPRKILLGVDRLDYTKGIDVRLKAFSELLAEGRVKRDDTV 
WQLATPSRERVESYQTLRNDIERQVGHINGEYGEVGHPWHYLHRPAPRDELIAFFVASDVMLVTP 
LRDGMNLVAKEYVACRSDLGGALVLSEFTGAAAELRHAYLVNPHDLEGVKDGIEEALNQTEEAGRR 
RMRSLRRQVLAHDVDRWAQSFLDALAGAHPRGQG 

20 

>Rv3598c lysS lysyl-tRNA synthase TB.seq 4041423:4042937 MW:55678 SEQ ID NO:275 
VSAADTAEDLPEQFRIRRDKRARLLAQGRDPYPVAVPRTHTLAEVRAAHPDLPIDTATEDIVGVAGRV 
IFARNSGKLCFATLQDGDGTQLQVMISLDKVGQAALDAWKADVDLGDIVYVHGAVISSRRGELSXAJk 
DCWRIAAKSLRPLPVAHKEMSEESRVRQRYVDLIVRPEARAVARLRIAWRAIRTALQRRGFLEVETP 
25 VLQTLAGGAAARPFATHSNALDIDLYLRIAPELFLKRCIVGGFDKVFELNRVFRNEGADSTHSPEFSM 
LETYQTYGTYDDSAWTRELIQEVADEAJGTRQLPLPDGSVYDIDGEWATIQMYPSLSVALGEEITPQT 
TVDRLRGIADSLGLEKDPAIHDNRGFGHGKLIEELWERTVGKSLSAPTFVKOFPVQTTPLTRQHRSIP 
GVTEKWDLYLRGIEIJ^TGYSELSDPWQRERFADQARAAAAGDDEAMVLDEDFIJ^EYGMPPCTG 
TGMGIDRLLMSLTGLSIRETVLFPIVRPHSN 

30 

>Rv3600c - similar to Bacillus subtilis protein YacB TB.seq 4043041 :4043856 MW:29274 
SEQ ID NO:276 

VLl-AIDVRrfrHTWGLLSGMi<EHAKWQQVWIRTESE\n-ADEUU.TIDGLIGEDSERLTGTAALST^ 
VLHEVRIMLDQYWPSVPHVLIEPGVRTG!PLLVDNPKEVGADRIVNCU\AYDRFRKAAIWDFGSSICV 
35 DWSAKGEFLGGAIAPGVQVSSDAAAARSAALRRVEI-ARPRSWGKNTVECMQAGAVFGFAGLVDG 
LVGRIREDVSGFSVDHDVAiVATGHTAPLLLPELHTVDHYDQHLTLQGLRLVFERNLEVQRGRLKTAR 
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>Rv3606c fblK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase TB.seq 
4048181:4048744 MW:20732 SEQ ID NO:277 

IV1TRWLSVGSNLGDRLARLRSVADGLGDALIAASPIYEADPWGGVEQGQFLNAVLIADDPTCEPREW 
LRRAQEFERAAGRVRGQRWGPRNLDVDLIACYQTSATEALVEVTARENHLTLPHPLAHLRAFVLIPW 
5 lAVDPTAQLTVAGCPRPVTRLLAELEPADRDSVRLFRPSFDLNSRHPVSRAPES 

>Rv3607c fblX may be involved in folate biosynthesis TB.seq 4048744:4049142 MW:14S53 

MADRIELRGLWHGRHGVYDHERVAGQRFVIDVT\AMDLAEAANSDDLADTYDYVRLASRAAEIVAG 

PPRKLIEWGAEIADHVIVIDDQRVHAVEVAVHKPQAPIPQTFDDVAWIRRSRRGGRGVVWPAGGAV 

10 >Rv3608c folP dihydropteroate synthase TB.seq 4049138:4049977 IVIW:28812 SEQ ID NO:278 
VSPAPVQVMGVLNVTDDSFSDGGCYLDLDDAVKHGLAMAAAGAGIVDVGGESSRPGATRVDPAVE 
TSRVIPWKELAACaGITVSIDTMRADVARAALQNGAQMVNDVSGGRADPAMGPLI-AEADVPVVVLMH 
WRAVSADTPHVPVRYGNWAEVRADLLASVADAVAAGVDPARLVLDPGLGFAKTAQHNWAILHALP 
ELVATGIPVLVGASRKRFLGALLAGPDGVMRPTDGRDTATAVISALAALHGAWGVRVHDVRASVDAI 

15 KWEAWMGAERIERDG 

>Rv3609c folE GTP cyclohydrolase I TB.seq 4049977:4050582 MW:22395 SEQ ID NO:279 
MSQLDSRSASARIRVFDQQRAEAAVRELLYAIGEDPDRDGLVATPSRVARSYREMFAGLYTDPDSVL 
NTMFDEDHDELVLVKEIPMYSTCEHHLVAFHGVAHVGYIPGDDGRVTGLSKIARLVDLYAKRPQVQE 
RLTSQIADALMKKLDPRGVIWIEAEHLCMAMRGVRKPGSVTTTSAVRGLFKTNAASRAEALDLILRK 

20 >Rv3610c ftsH inner membrane protein, chaperone TB.seq 4050601:4052880 MW:81987 

MNRKNVTRTITAIANA/VLLGWSFFYFSDDTRGYKPVDTSVAITQINGDNVKSAQIDDREQQLRLILKKG 
NNETDGSEKViTKYPTGYAVDLFNALSAKNAKVSTWNQGSILGELLVYVLPLLLLVGLFVMFSRMQG 
GARMGFGFGKSRAKQLSKDMPKTTFADVAGVDEAVEELYEiKDFLQNPSRYQALGAKIPKGVLLYGP 
PGTGKTLLARAVAGEAGVPFFTISGSDFVEMFVGVGASRVRDLFEQAKQNSPCIIFVDEIDAVGRQR 

25 GAGLGGGHDEREQTLNQLLVEMDGFGDRAGVILIAATNRPDILDPALLRPGRFDRQIPVSNPDLAGR 
RAVT.RVHSKGKPMAADADLDGIJ>lKRTVGMTGADIJ^INEAALLTARENGTVITGPALEEAVDRVIG 
GPRRKGRIISEQEKKITAYHEGGHTLAAWAMPDIEPIYKVTIU^RGRTGGHAVAVPEEDKGLRTRSEMI 
AQLVFAI\/IGGRAAEELVFREPTTGAVSDIEQATKIARSMVTEFGMSSKLGAVKYGSEHGDPFLGRT1\1 
GTQPDYSHEVAREIDEEVRKLIEAAHTEAWEILTEYRDVLDTI-AGELLEKETLHRPELESIFADVEKRP 

30 RLTMFDDFGGRIPSDKPPIKTPGELAJERGEPWPQPVPEPAFKAAIAQATQAAEAARSDAGQTGHGA 
NGSPAGTHRSGDRQYGSTQPDYGAPAGWHAPGWPPRSSHRPSYSGEPAPTYPGQPYPTGQADP 
GSDESSAEQDDEVSRTKPAHG 

>Rv3671c - TB.seq 4112322:4113512 MW:40722 SEQ ID NO:280 

MTPSQWLDIAVLAVAFIAAISGWRAGALGSMLSFGGVLLGATAGVLLAPHIVSQISAPRAKLFAALFLIL 
35 ALNA/VGEVAGWLGRAVRGAIRNRPIRLIDSVIGVGVQLNAA/LTAAWLLAMPLTQSKEQPEUW^lVKG 
SRVLARVNEAAPTWLKTVPKRLSALLNTSGLPAVLEPFSRTPVIPVASPDPALVNNPWAATEPSWKI 
RSLAPRCQKVLEGTGFVISPDRVMTNAHWAGSNNVTVYAGDKPFEATWSYDPSVDVAILAVPHLP 
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PPPLVFAAEPAKTGADVWLGYPGGGNFTATPARIREAIRLSGPDIYGDPEPVTRDVYTIRADVEQGD 

SGGPLIDLNGQVLGWFGAAIDDAETGFVLTAGEVAGQLAKIGATQPVGTGACVS 

>Rv3682 ponA2TB.seq 4121913:4124342 MW:84637 SEQ ID NO:281 

MPERLPAAITVLKLAGCCLLASWATALTFPFAGGLGLMSNRASEWANGSAQLLEGQVPAVSTMVD 
5 AKGNTIAWLYSQRRFEVPSDKIANTMKLAIVSIEDKRFADHSGVDWKGTLTGLAGYASGDLDTRGGS 
TLEQQYVKNYQLLVTAQTDAEKRAAVETTPARKLREIRMALTLDKTFTKSEILTRYLNLVSFGNNSFG 
VQDAAQTYFGINASDLNWQCW^U^GMVQSTSTLNPYTNPDGALARRNVVLDTMIENLPGEAEALR 
AAKAEPLGVLPQPNELPRGCIAAGDRAFFCDYVQEYLSRAGISKEQVATGGYLIRTTLDPEVQAPVKA 
AIDKYASPNLAGISSVMSVIKPGKDAHKVLAMASNRKYGLDLEAGETMRPQPFSLVGDGAGSIFKIFT 

10 TAAALDMGMGINAQLDVPPRFQAKGLGSGGAKGCPKETWCVVI^AGNYRGSMNVTDAIJVTSPN^ 

AKLISQVGVGRAVDMAIKLGLRSYANPGTARDYNPDSNESLADFVKRQNLGSFTLGPIELNALELSNV 
AATLASGGVWCPPNPIDQLIDRNGNEVAVm'ETCDQWPAGLANTLANAMSKDAVGSGTAAGSAGA 
AGWDLPMSGKTGTTEAHRSAGFVGFTNRYAAANYIYDDSSSPTDLCSGPLRHCGSGDLYGGNEPS 
RTWFAAMKPIANNFGEVQLPPTDPRYVDGAPGSRVPSVAGLDVDAARQRLKDAGFQVADQTNSVN 

15 SSAKYGEWGTSPSGQTlPGSIVnQISNGIPPAPPPPPLPEDGGPPPPVGSQWEIPGLPPITIPLLAP 
PPPAPPP 

>Rv3721c dnaZX DMA polymerase lll,[gamma] (dnaZ) and t (dnaX) TB.seq 4164995:4166728 
MW:61892 SEQ ID NO:282 

VALYRKYRPASFAEWGQEHVTAPLSVALDAGRINHAYLFSGPRGCGKTSSARILARSLNCAQGPTA 
20 NPCGVCESCVSLAPNAPGSIDWELDAASHGGVDDTRELRDRAFYAPVQSRYRVFIVDEAHMVTTA 
GFNALLKIVEEPPEHLIFIFATTEPEKVLPTIRSRTHHYPFRLLPPRTMRALLARICEQEGWVDDAVYP 
LVIRAGGGSPRDTLSVLDQLLAGAADTHVTYTRALGLLGVTDVALIDDAVDALAACDAAALFGAIESVI 
DGGHDPRRFATDLLERFRDLIVLQSVPDAASRGWDAPEDALDRMREQAARIGRATLTRYAEWQA 
GLGEMRGATAPRLLLEWCARLLLPSASDAESALLQRVERIETRLDMSiPAPQAVPRPSAAAAEPKHQ 
25 PAREPRPV1^TPASSEPWAAVRS^4WPWRDKVRLRSRTTEVM1J^GAWRALEDNTLVLTHESAPL 
ARRLSEQRNADVLAEALKDALGVNWRVRCETGEPAAAASPVGGGANVATAKAVNPAPTANSTQRD 
EEEHMLAEAGRGDPSPRRDPEEVALELLQNELGARRIDNA 
>Rv3783 - TB.seq 4229255:4230094 MW:32337 SEQ ID NO:283 

MTFMDAQASFQTQSRTLARVRGDLVDGFRRHELVVLHLGWQDIKQRYRRSVLGPFWITIATGTTAVA 
30 MGGLYSKLFRLELSEHLPYVTLGLIVWNLINAAILDGAEVFVANEGLIKQLPAPLSVHVYRLVWRQMIF 
FAHNMYFVIAIIFPKPWSWADLSFLPALALIFLNCVVVVSLCFGILATRYRDIGPLLFSWQLLFFMTPII 
WNDETLRRQGAGRWSSIVELNPLLHYLDIVRAPLLGAHQELRHVVLWLVLTWGWMLAAFAMRQYR 
ARVPYWV 

>Rv3789 - TB.seq 4235371:4235733 MW:13378 SEQ ID NO:284 
35 MRFWTGGLAGIVDFGLYNA/LYKVAGLQVDLSKAISFIVGTITAYLINRRWTFQAEPSTARFVAVMLLY 
GITFAVQVGLN HLCLALLH YRAWAI PVAFVI AQGTATVI NFI VQRAVI FRI R 
>Rv3790 -TB.seq 4235776:4237158 MW:50164 SEQ ID NO:285 
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MLSVGATTTATRLTGWGRTAPSVATnIVLRTPDAEMIVKAVARVAESGGGRGAIARGLGRSYGDNAQN 
GGGLVIDMTPLNTIHSIDADTKLVDIDAGVNLDQLMKAALPFGLVVVPVLPGTRQVTVGGAIACDIHGK 
NHHSAGSFGNHVRSMDLLTADGEIRHLTPTGEDAELFWATVGGNGLTGIIMRATIEMTPTSTAYFIAD 
GDVTASLDETIALHSDGSEARYTYSSAWFDAISAPPKLGRAAVSRGRLATVEQLPAKLRSEPLKFDAP 
5 QLLTLPDVFPNGLANKYTFGPIGELVVnrRKSG'mRGKVQNLTQFYHPLDMFGEWNRAYGPAGFLQYQ 
FVIPTEAVDEFKKIIGVIQASGHYSFLNVFKLFGPRNQAPLSFPIPGWNICVDFPIKDGLGKFVSELDRR 
VLEFGGRLYTAKDSRTTAETFHAMYPRVDEWISVRRKVDPLRVFASDMARRLELL 
>Rv3791 - TB.seq 4237162:4237923 MW:27470 SEQ ID NO:286 

M\A.DAVGNPQT\a.LLGGTSEIGLAICERYLHNSAARIVLACLPDDPRREDAAAAMKQAGARSVELIDF 
10 DALDTDSHPKMIEAAFSGGDVDVAIVAFGLLGD/i£ELWQNQRKAVQIAEINYTAAVSVGVLLAEKMR 
AQGFGQIIAMSSAAGERVRRANFVYGSTKAGUDGFYLGLSEALREYGVRVLVIRPGQVRTRMSAHLK 
EAPLTVDKEYVANLAVTASAKGKELVWAPAAFRYVMMVLRHIPRSIFRKLPI 
>Rv3794 embA TB.seq 4243230:4246511 MW:115694 SEQ ID NO:287 

VPHDGNERSHRIARLAAWSGIAGLLLCGIVPLLPVNQTTATIFWPQGSTADGNITQITAPLVSGAPRA 

15 LDISIPCSAIATLPANGGLVLSTLPAGGVDTGKAGLFVRANQDTWVAFRDSVAAVAARSTIAAGGCS 
ALHIWADTGGAGADFMGIPGGAGTLPPEKKPQVGGIFTDLXVGAQPGLSARVDIDTRFITTPGALKKA 
VMU.GVLAVLVANWGU\ALDRLSRGRTLRDV\rt.TRYRPR\mVGFASRUVDAAVIATLLLWHV^ 
bDGYLLTVARVAPKAGYVANYYRYFGTTEAPFDWYTS\/LAQLA^VSTAG^WMRLPATIJ^GIACWLI^ 
SRFVLRRLGPGPGGU^SNRVAVFrAGAVFLSAVNrt-PFNNGLRPEPLIALGVLVTWVLVERSIALGRLAP 

20 AAVAllVATLTATLAPQGLIALAPLLTGARAIAQRIRRRRATDGLLAPU>kVLAAALSLITVWFRDQTLA 

AESARIKYKVGPTIAWYQDFLRYYFLTVESNVEGSMSRRFAVLVLLFCLFGVLFVLLRRGRVAGLASG 
PAWRLIGTTAVGLLLLTFTPTKWAVQFGAFAGLAGVLGAVTAFTFARIGLHSRRNLTLYVTALLFVLA 
WATSGINGWFYVGNYGVPWYDIQPVIASHPVTSMFLTLSILTGLLAAVVYHFRMDYAGHTEVKDNRR 
l^iRILASTPLLWAVIMVAGEVGSIVMKAAVFRYPLYTTAKANLTALSTGLSSCAMADDVLAEPDPNAGM 

25 LQPVPGQAFGPDGPLGGISPVGFKPEGVGEDLKSDPWSKPGLVNSDASPNKPNAAITDSAGTAGG 
KGPVGINGSHAALPFGLDPARTPVMGSYGENNUWTATSAWYQLPPRSPDRPLWVSAAGAIWSYK 
EDGDFIYGQSLKLQWGVTGPDGRIQPLGQVFPIDIGPQPAWRNLRFPLAWAPPEADVARIVAYDPNL 
SPEQWFAFTPPRVPVLESLQRLIGSATPVLMDIATAANFPCQRPFSEHLGIAELPQYRILPDHKQTAA 
SSNLWQSSSTGGPFLFTQALLRTSTIATYLRGDWYRDWGSVEQYHRLVPADCaAPDAWEEGVITVP 

30 GWGRPGPIRALP 

>Rv37g5 embB TB.seq 4246511:4249804 MW:1 18023 SEQ ID NO:288 

MTQCASRRKSTPNRAILGAFASARGTRWVATIAGLIGFVLSVATPLLPWQTTAMLDWPQRGQLGSV 
TAPLISLTPNADFTATVPCDWRAMPPAGGWLGTAPKQGKDANLQALFVWSAQRVDVTDRNWILS 
VPREQVTSPQCQRIEVTSTHAGTFANFVGLKDPSGAPLRSGFPDPNLRPQIVGVFTDLTGPAPPGLA 
35 VSATIDTRFSTRPTTLKLLAIIGAIVATWALIALWRLDQLDGRGSIAQLLLRPFRPASSPGGMRRLIPAS 
WRTFTLTDAWIFGFLLWHVIGANSSDDGYILGMARVADHAGYMSNYFRWFGSPEDPFGWYYNLLA 
LMTHVSDASLWMRLPDLAAGLVCWLLLSREVLPRLGPAVEASKPAYWAAAMVLLTAWMPFNNGLR 
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PEGIIALGSLVmVUERSMRYSRLTPAALAVVTAAFTLGVQPTGLIAVAALVAGGRPMLRILVRRHRLV 
GTLPLVSPMLAAGTVILTWFADQTLSTVLEATR\mAKIGPSQAWYTENLRYYYLILP7VDGSL 
FLITALCLFTAVFIMLRRKRIPSVARGPAWRLMGVIFGTMFFLMFTPTKWVHHFGLFAAVGAAMAALT 
7VLVSPSVLRWSRNRMAFLAALFFLLALCWATTNGVVVVYVSSYGVPFNSAMPKI DGITVSTI FFALFAI 
5 AAGYAAWLHFAPRGAGEGRLIRALTTAPVPIVAGFMAAVFVASMVAGIVRQYPTYSNGWSNVRAFV 
GGCGLADDVLVEPDTNAGFMKPLDGDSGSWGPLGPLGGVNPVGFTPNGVPEHTVAEAIVMKPNQP 
GTDYDWDAPTKLTSPGINGSTWLPYGLDPARVPUkGTYTTGAQQQSTLVSAVmJ-PKPDDGHPLV 
WTAAGKIAGNSVLHGYTPGQTWLEYAMPGPGALVPAGRMVPDDLYGEQPKAWRNLRFAFIAKMP 
ADAVAVRWAEDLSLTPEDWIAVTPPRVPDLRSLQEYVGSTQPVLLDWAVGLAFPCQQPMLHANGIA 
10 EIPKFRITPDYSAKKLDTDTVVEDGTNGGLLGITDLLLRAHVMATYLSRDWARDWGSLRKFDTLVDAP 
PAQLELGTATRSGLWSPGKI RIGP 

>Rv3834c serS seryHRNA synthase TB.seq 4307655:430891 1 MW:45293 SEQ ID NO:289 
VIDLKLLRENPDAVRRSQLSRGEDPALVDALLTADAARRAVISTADSLRAEQKAASKSVGGASPEERP 
PLLRRAKELAEQVKAAEADEVEAEAAFTAAHLAISNVIVDGVPAGGEDDYAVLDWGEPSYLENPKD 
15 HLELGESLGUDMQRGAKVSGSRFYFLTGRGALLQLGLLQLAIJ<LA\mNGFVPTIPPVLVRPEVMVGT 
GFLGAHAEEVYRVEGDGLYLVGTSEVPLAGYHSGEILDLSRGPLRYAGWSSCFRREAGSHGKDTRG 
IIRVHQFDKVEGFWCTPADAEHEHERLLGWQRQMLARIEVPYRVIDVAAGDLGSSAARKFDCEAWI 
PTQGAYRELTSTSNCTTFQARRLATRYRDASGKPQIAATLNGTLATTRWLVAILENHQRPDGSVRVP 
DALVPFVGVEVLEPVA 

20 >Rv3907c pcnA polynucleotide polymerase TB.seq 4391 631 :4393070 MW:53057 SEQ I D NO:290 
VPEAVQEADLLTAAAVALNRHAALLRELGSVFAAAGHELYLVGGSVRDALLGRLSPDLDFTTDARPE 
RVQEIVRPWADAVWDTGI EFGTVGVGKSDHRMEITTFRADSYDRVSRHPEVRFGDCLEGDLVRRDF 
TTNAMAVRVTATGPGEFLDPLGGLAALRAKVLDTPAAPSGSFGDDPLRMLRAARFVSQLGFAVAPR 
VRAAIEEMAPQLARISAERVAAELDKLLVGEDPAAGIDLMVQSGMGAWLPEIGGMRMAIDEHHQHK 

25 DVYQHSLTVLRQAIALEDDGPDLVLRWAALLHDIGKPATRRHEPDGGVSFHHHEWGAKMVRKRMR 
ALKYSKQMIDDISQLVYLHLRFHGYGDGKWTDSAVRRYVTDAGALLPRLHKLVRADCTTRNKRRAAR 
LQASYDRLEERIAELAAQEDLDRVRPDLDGNQIMAVLDIPAGPQVGEAWRYLKELRLERGPLSTEEA 
TTELLSWWKSRGNR 

30 A number of embodiments of the invention have been described. Neverthe- 

less, it will be imderstood that various modifications may be made without departing from 
the spirit and scope of the invention. Accordingly, other embodiments are within the scope 
of &e following claims. 

35 
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WHAT IS CLAIMED IS: 

1 . A method for identifying a nucleic acid or a polypeptide sequence that 
may be a target for a drug comprising the following steps: 

(a) providing a first nucleic acid or a polypeptide sequence that is known to 
5 be a drug target; 

(b) providing at least one algorithm selected from the group consisting of a "domain 
fusion" method, a ''phylogenetic profile" method and a •'physiologic linkage" method, 
wherein the algorithm is capable analyzing a functional relationship between nucleic acid or 
polypeptide sequences; and 
10 (c) comparing the first nucleic acid or the polypeptide drug target sequence to a 

plurality of sequences using at least one of the algorithms as set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence, thereby identifying a 
nucleic acid or a polypeptide sequence that may be a target for a dmg . 

15 2. A method for identifying a nucleic acid or a polypeptide sequence that 

may be essential for the growth or viability of an organism comprising the following steps: 
(a) providing a first nucleic acid or a polypeptide sequence that is known to 
be essential for the growth or viability of an organism; 

(b) providing at least one algorithm capable analyzing a functional relationship 
20 between nucleic acid or polypeptide sequences selected from the group consisting of a 

"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 

(c) comparing the first nucleic acid or the polypeptide sequence to a plurality of 
sequences using at least one of the algorithms as set forth in step (b) to identify a second 

25 sequence that has a functional relationship to the first sequence, thereby identifying a nucleic 
acid or a polypeptide sequence that may be essential for the growth or viability of an 
organism. 

3. The method of claim 1 or claim 2, wherein the drug is an anti- 
30 microbial drug. 
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4. The method of claim 1 or claim 2, wherein the first nucleic acid or a 
polypeptide sequence is derived firom a pathogen. 

5. The method of claim 4, wherein the pathogen is a microorganism. 

6. The method of claim 1 or claim 2, wherein the microorganism is 
Mycobacterium tuberculosis (MTB). 

7. The method of claim 1 or claim 2, wherein the plurality of sequences 
10 used to identify a second sequence comprises a database of the gene sequences of an entire 

genome of an organism. 

8. The method of claim 1 or claim 2, wherein the plurality of sequences 
used to identify a second sequence comprises a database of the gene sequences derived from 

15 a pathogen. 

9. The method of claim 1 or claim 2, wherein the "phylogenetic profile" 
method algorithm comprises 

(a) obtaining data, comprising a list of proteins from at least two genomes; 
20 (b) comparing the list of proteins to form a protein phylogenetic profile for 

each protein, wherein the protein phylogenetic profile indicates the presence or absence of a 
protein belonging to a particular protein family in each of the at least two genomes based on 
homology of the proteins; and 

(c) grouping the list of proteins based on similar profiles, wherein proteins 
25 with similar profiles are indicated to have a fimctional relationship. 

10. The method of claim 9, wherein the phylogenetic profile is in the form 
of a vector, matrix or phylogenetic tree. 

30 11. The method of claim 9, comprising determining the significance of 

homology between the proteins by computing a probability (p) value threshold. 
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1 2. The method of claim 1 1 , ^^e^ein the probability is set with respect to 
the value 1/NM, based on the total number of sequence comparisons that are to be 
performed, wherein N is the number of proteins in the first organism's genome and A/ in all 

5 other genomes. 

13. The method of claim 9, vs^erein the presence or absence is by 
calculating an evolutionary distance. 

10 14. The method of claim 13, wlierein the evolutionary distance is 

calculated by: 

(a) aligning two sequences from the list of proteins; 

(b) determining an evolution probability process by constructing a conditional 
probability matrix: p(aa— >aa'), where aa and aa' are any amino acids, said conditional 

15 probability matrix being constmcted by converting an amino acid substitution matrix from a 
log odds matrix to said conditional probability matrix; 

(c) accoimting for an observed alignment of the constmcted conditional 
probability matrix by taking the product of the conditional probabilities for each aligned pair 
during the alignment of the two sequences, represented by -P(p)=]^ p(acu» —> aa\) ; and 

n 

20 (d) detemiining an evolutionary distance a from powers equation 

p'=p"(aa-*aa'), maximizing for P. 

15. The method of claim 14, wherein the conditional probability matrix is 
defined by a Markov process with substitution rates, over a fixed time interval. 



25 



16. The method of claim 14, where the conversion from an amino acid 
substitution matrix to a conditional probability matrix is represented by: 

BLOSlJM62ii 
W ^J) =p0^2^ 0 > 
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where BLOSUM62 is an amino acid substitution matrix, and P(i->j) is the 
probability that amino acid i is replaced by amino acid j through point mutations according to 
BLOSIJM62 scores, 

17. The method of claim 16, where Pfs are the abundances of amino acid 
j and are computed by solving a plurality of linear equations given by the normalization 
condition that: 



1 8. The method of claim 1 or claim 2, wherein the '"physiologic linkage' 
10 method algorithm identifies proteins and nucleic acids that participate in a conmion 

functional pathway. 

19. The method of claim 1 or claim 2, wherein the **physiologic linkage' 
method algorithm comprises identifies proteins and nucleic acids that participate in the 

15 synthesis of a common stmctural complex. 



20. The method of claim 1 or claim 2, wherein the ''physiologic linkage" 
method algorithm comprises identifies proteins and nucleic acids that participate in a 
common metabolic pathway. 

20 

21 . The method of claim 1 or claim 2, wherein the "domain fusion" 
method algorithm comprises 

(a) aligning a first primary amino acid sequence of multiple distinct non-homologous 
polypeptides to second primary amino acid sequence of a plurality of proteins; and 
25 (b) for any alignment found between the first primary amino acid sequences of all of 

such multiple distinct non-homologous polypeptides and at least one protein of the second 
primary amino acid sequences, outputting an indication identifying the aligned second 
primary amino acid sequence as an indication of a functional link between the aligned first 
and second polypeptide sequences. 

30 
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22. The method of claim 21, wherein the aligning is perforaied by an 
algorithm selected from the group consisting of a Smith- Waterman algorithm, Needleman- 
Wxmsch algorithm, a BLAST algorithm, a FASTA algorithm, and a PSI-BLAST algorithm. 

5 23. The method of claim 21 , wherein the multiple distinct non- 

homologous polypeptides are obtained by translating a nucleic acid sequence from a genome 
database. 

24. The method of claim 21, wherein the plurality of proteins have a 
1 0 known function. 

25 . The method of claim 2 1 , wherein at least one of the multiple distinct 
non-homologous polypeptides has a known function. 

15 26. The method of claim 21 , wherein at least one of the multiple distinct 

non-homologous polypeptides has an vmknown function. 

27. The method of claim 2 1 , wherein the alignment is based on the degree 
of homology of the multiple distinct non-homologous polypeptides to the plurality of 

20 proteins. 

28. The method of claim 2 1 , further comprising determining the 
significance of the aligned and identified second primary amino acid sequence by computing 
a probability (p) value threshold. 

25 

29. The method of claim 28, wherein the probability threshold is set with 
respect to the value 1/NM, based on the total number of sequence comparisons that are to be 
performed, wherein N is the number of proteins in a first organism's genome and M in all 
other genomes. 

30 

30. The method of claim 21, further comprising filtering excessive 

functional links between one first primary amino acid sequence of multiple distinct non- 
197 



BNSOOCt[>:<WO 0135317A1 I > 



wo 01/35317 



PCT/USOO/31152 



homologous polypeptides and an excessive number of other distinct non-homologous 
polypeptides for any alignment found between the first primary amino acid sequences of the 
distinct non-homologous polypeptides and at least one of the second primary amino acid 
sequences of the plurality of proteins. 

5 

31. A computer program product, stored on a computer-readable medium, 
for identifying a nucleic acid or a polypeptide sequence that may be a target for a dmg, the 
computer program product comprising instructions for causing a computer system to be 
capable of: 

10 (a) inputting a first nucleic acid or a polypeptide sequence that is known to be 

a dmg target; 

(b) accessing at least one algorithm capable analyzing a functional relationship 
between nucleic acid or polypeptide sequences selected from the group consisting of a 
"domain fiision" method, a '*phylogenetic profile" method and a '*physiologic linkage" 
15 method; and 

(c) comparing the first nucleic acid or the polypeptide drug target sequence to 
a plurality of sequences using at least one of the algorithms set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence and generating an 
output identifying a nucleic acid or a polypeptide sequence that may be a target for a drug . 

20 

32. A computer program product, stored on a computer-readable mediiun, 
for identifying a nucleic acid or a polypeptide sequence that may be essential for the growth 
or viability of an organism, the computer program product comprising instructions for 
causing a computer system to be capable of: 

25 (a) providing a first nucleic acid or a polypeptide sequence that is known to 

be essential for the growth or viability of an organism; 

(b) accessing at least one algorithm capable analyzing a functional relationship 
between nucleic acid or polypeptide sequences selected from the group consisting of a 
"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 

30 method; and 
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(c) comparing the first nucleic acid or the polypeptide sequence to a plurality of 
sequences using at least one of the algorithms set forth in step (b) to identify a second 
sequence that has a functional relationship to the first sequence and generating an output 
identifying a nucleic acid or a polypeptide sequence that may be essential for the growth or 
5 viability of an organism. 

33. A computer system, comprising: 

(a) a processor; and 

(b) a computer program product as set forth in claim 3 1 or claim 32. 

10 
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Figure 1 
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Figure 2 
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Figure 3 
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Figure 4 
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Figure 5 
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